from:"\\\\\\\?\\\\\\\?\\\\\\\?\\\\\\\?\\\\\\\?\\\\\\\?\\\\\\\?\\\\\\\?"

unsubscribe

2023-10-18 Thread Duy Pham

unsubscribe

Re: [DISCUSS] SPIP: State Data Source - Reader

2023-10-18 Thread Jungtaek Lim

Thanks Yuanjian for your support!

I've left a comment but to replicate here - I agree with your point. It's
really uneasy for a new feature to be stable from the initial version and
we might want to decide on breaking backward compatibility for
(semantic) bug fixes/improvements. Maybe we could mark the data source as
incubating/experimental and look for a couple of minor releases to see
whether the options/behaviors can be finalized.

On Wed, Oct 18, 2023 at 4:24 PM Yuanjian Li  wrote:

> +1, I have no issues with the practicality and value of this feature
> itself.
> I've left some comments concerning ongoing maintenance and
> compatibility-related matters, which we can continue to discuss.
>
> Jungtaek Lim  于2023年10月17日周二 05:23写道：
>
>> Thanks Bartosz and Anish for your support!
>>
>> I'll wait for a couple more days to see whether we can hear more voices
>> on this. We could probably look for initiating a VOTE thread if there is no
>> objection.
>>
>> On Tue, Oct 17, 2023 at 5:48 AM Anish Shrigondekar <
>> anish.shrigonde...@databricks.com> wrote:
>>
>>> Hi Jungtaek,
>>>
>>> Thanks for putting this together. +1 from me and looks good overall.
>>> Posted some minor comments/questions to the doc.
>>>
>>> Thanks,
>>> Anish
>>>
>>> On Mon, Oct 16, 2023 at 11:25 AM Bartosz Konieczny <
>>> bartkoniec...@gmail.com> wrote:
>>>
 Thank you, Jungtaek, for your answers! It's clear now.

 +1 for me. It seems like a prerequisite for further ops-related
 improvements for the state store management. I mean especially here the
 state rebalancing that could rely on this read+write state store API. I
 don't mean here the dynamic state rebalancing that could probably be
 implemented with a lower latency directly in the stateful API. Instead I'm
 thinking more of an offline job to rebalance the state and later restart
 the stateful pipeline with the changed number of shuffle partitions.

 Best,
 Bartosz.

 On Mon, Oct 16, 2023 at 6:19 PM Jungtaek Lim <
 kabhwan.opensou...@gmail.com> wrote:

> bump for better reach
>
> On Thu, Oct 12, 2023 at 4:26 PM Jungtaek Lim <
> kabhwan.opensou...@gmail.com> wrote:
>
>> Sorry, please use this link instead for SPIP doc:
>> https://docs.google.com/document/d/1_iVf_CIu2RZd3yWWF6KoRNlBiz5NbSIK0yThqG0EvPY/edit?usp=sharing
>>
>>
>> On Thu, Oct 12, 2023 at 3:58 PM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>>
>>> Hi dev,
>>>
>>> I'd like to start a discussion on "State Data Source - Reader".
>>>
>>> This proposal aims to introduce a new data source "statestore" which
>>> enables reading the state rows from existing checkpoint via offline 
>>> (batch)
>>> query. This will enable users to 1) create unit tests against stateful
>>> query verifying the state value (especially flatMapGroupsWithState), 2)
>>> gather more context on the status when an incident occurs, especially 
>>> for
>>> incorrect output.
>>>
>>> *SPIP*:
>>> https://docs.google.com/document/d/1HjEupRv8TRFeULtJuxRq_tEG1Wq-9UNu-ctGgCYRke0/edit?usp=sharing
>>> *JIRA*: https://issues.apache.org/jira/browse/SPARK-45511
>>>
>>> Looking forward to your feedback!
>>>
>>> Thanks,
>>> Jungtaek Lim (HeartSaVioR)
>>>
>>> ps. The scope of the project is narrowed to the reader in this SPIP,
>>> since the writer requires us to consider more cases. We are planning on 
>>> it.
>>>
>>

 --
 Bartosz Konieczny
 freelance data engineer
 https://www.waitingforcode.com
 https://github.com/bartosz25/
 https://twitter.com/waitingforcode

Re: [DISCUSS] SPIP: State Data Source - Reader

2023-10-18 Thread Yuanjian Li

+1, I have no issues with the practicality and value of this feature itself.
I've left some comments concerning ongoing maintenance and
compatibility-related matters, which we can continue to discuss.

Jungtaek Lim  于2023年10月17日周二 05:23写道：

> Thanks Bartosz and Anish for your support!
>
> I'll wait for a couple more days to see whether we can hear more voices on
> this. We could probably look for initiating a VOTE thread if there is no
> objection.
>
> On Tue, Oct 17, 2023 at 5:48 AM Anish Shrigondekar <
> anish.shrigonde...@databricks.com> wrote:
>
>> Hi Jungtaek,
>>
>> Thanks for putting this together. +1 from me and looks good overall.
>> Posted some minor comments/questions to the doc.
>>
>> Thanks,
>> Anish
>>
>> On Mon, Oct 16, 2023 at 11:25 AM Bartosz Konieczny <
>> bartkoniec...@gmail.com> wrote:
>>
>>> Thank you, Jungtaek, for your answers! It's clear now.
>>>
>>> +1 for me. It seems like a prerequisite for further ops-related
>>> improvements for the state store management. I mean especially here the
>>> state rebalancing that could rely on this read+write state store API. I
>>> don't mean here the dynamic state rebalancing that could probably be
>>> implemented with a lower latency directly in the stateful API. Instead I'm
>>> thinking more of an offline job to rebalance the state and later restart
>>> the stateful pipeline with the changed number of shuffle partitions.
>>>
>>> Best,
>>> Bartosz.
>>>
>>> On Mon, Oct 16, 2023 at 6:19 PM Jungtaek Lim <
>>> kabhwan.opensou...@gmail.com> wrote:
>>>
 bump for better reach

 On Thu, Oct 12, 2023 at 4:26 PM Jungtaek Lim <
 kabhwan.opensou...@gmail.com> wrote:

> Sorry, please use this link instead for SPIP doc:
> https://docs.google.com/document/d/1_iVf_CIu2RZd3yWWF6KoRNlBiz5NbSIK0yThqG0EvPY/edit?usp=sharing
>
>
> On Thu, Oct 12, 2023 at 3:58 PM Jungtaek Lim <
> kabhwan.opensou...@gmail.com> wrote:
>
>> Hi dev,
>>
>> I'd like to start a discussion on "State Data Source - Reader".
>>
>> This proposal aims to introduce a new data source "statestore" which
>> enables reading the state rows from existing checkpoint via offline 
>> (batch)
>> query. This will enable users to 1) create unit tests against stateful
>> query verifying the state value (especially flatMapGroupsWithState), 2)
>> gather more context on the status when an incident occurs, especially for
>> incorrect output.
>>
>> *SPIP*:
>> https://docs.google.com/document/d/1HjEupRv8TRFeULtJuxRq_tEG1Wq-9UNu-ctGgCYRke0/edit?usp=sharing
>> *JIRA*: https://issues.apache.org/jira/browse/SPARK-45511
>>
>> Looking forward to your feedback!
>>
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>>
>> ps. The scope of the project is narrowed to the reader in this SPIP,
>> since the writer requires us to consider more cases. We are planning on 
>> it.
>>
>
>>>
>>> --
>>> Bartosz Konieczny
>>> freelance data engineer
>>> https://www.waitingforcode.com
>>> https://github.com/bartosz25/
>>> https://twitter.com/waitingforcode
>>>
>>>

unsubscribe

2023-10-17 Thread chu Dragon

unsubscribe

Re: [DISCUSS] SPIP: State Data Source - Reader

2023-10-16 Thread Jungtaek Lim

Thanks Bartosz and Anish for your support!

I'll wait for a couple more days to see whether we can hear more voices on
this. We could probably look for initiating a VOTE thread if there is no
objection.

On Tue, Oct 17, 2023 at 5:48 AM Anish Shrigondekar <
anish.shrigonde...@databricks.com> wrote:

> Hi Jungtaek,
>
> Thanks for putting this together. +1 from me and looks good overall.
> Posted some minor comments/questions to the doc.
>
> Thanks,
> Anish
>
> On Mon, Oct 16, 2023 at 11:25 AM Bartosz Konieczny <
> bartkoniec...@gmail.com> wrote:
>
>> Thank you, Jungtaek, for your answers! It's clear now.
>>
>> +1 for me. It seems like a prerequisite for further ops-related
>> improvements for the state store management. I mean especially here the
>> state rebalancing that could rely on this read+write state store API. I
>> don't mean here the dynamic state rebalancing that could probably be
>> implemented with a lower latency directly in the stateful API. Instead I'm
>> thinking more of an offline job to rebalance the state and later restart
>> the stateful pipeline with the changed number of shuffle partitions.
>>
>> Best,
>> Bartosz.
>>
>> On Mon, Oct 16, 2023 at 6:19 PM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>>
>>> bump for better reach
>>>
>>> On Thu, Oct 12, 2023 at 4:26 PM Jungtaek Lim <
>>> kabhwan.opensou...@gmail.com> wrote:
>>>
 Sorry, please use this link instead for SPIP doc:
 https://docs.google.com/document/d/1_iVf_CIu2RZd3yWWF6KoRNlBiz5NbSIK0yThqG0EvPY/edit?usp=sharing


 On Thu, Oct 12, 2023 at 3:58 PM Jungtaek Lim <
 kabhwan.opensou...@gmail.com> wrote:

> Hi dev,
>
> I'd like to start a discussion on "State Data Source - Reader".
>
> This proposal aims to introduce a new data source "statestore" which
> enables reading the state rows from existing checkpoint via offline 
> (batch)
> query. This will enable users to 1) create unit tests against stateful
> query verifying the state value (especially flatMapGroupsWithState), 2)
> gather more context on the status when an incident occurs, especially for
> incorrect output.
>
> *SPIP*:
> https://docs.google.com/document/d/1HjEupRv8TRFeULtJuxRq_tEG1Wq-9UNu-ctGgCYRke0/edit?usp=sharing
> *JIRA*: https://issues.apache.org/jira/browse/SPARK-45511
>
> Looking forward to your feedback!
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> ps. The scope of the project is narrowed to the reader in this SPIP,
> since the writer requires us to consider more cases. We are planning on 
> it.
>

>>
>> --
>> Bartosz Konieczny
>> freelance data engineer
>> https://www.waitingforcode.com
>> https://github.com/bartosz25/
>> https://twitter.com/waitingforcode
>>
>>

Re: [DISCUSS] SPIP: State Data Source - Reader

2023-10-16 Thread Anish Shrigondekar

Hi Jungtaek,

Thanks for putting this together. +1 from me and looks good overall. Posted
some minor comments/questions to the doc.

Thanks,
Anish

On Mon, Oct 16, 2023 at 11:25 AM Bartosz Konieczny 
wrote:

> Thank you, Jungtaek, for your answers! It's clear now.
>
> +1 for me. It seems like a prerequisite for further ops-related
> improvements for the state store management. I mean especially here the
> state rebalancing that could rely on this read+write state store API. I
> don't mean here the dynamic state rebalancing that could probably be
> implemented with a lower latency directly in the stateful API. Instead I'm
> thinking more of an offline job to rebalance the state and later restart
> the stateful pipeline with the changed number of shuffle partitions.
>
> Best,
> Bartosz.
>
> On Mon, Oct 16, 2023 at 6:19 PM Jungtaek Lim 
> wrote:
>
>> bump for better reach
>>
>> On Thu, Oct 12, 2023 at 4:26 PM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>>
>>> Sorry, please use this link instead for SPIP doc:
>>> https://docs.google.com/document/d/1_iVf_CIu2RZd3yWWF6KoRNlBiz5NbSIK0yThqG0EvPY/edit?usp=sharing
>>>
>>>
>>> On Thu, Oct 12, 2023 at 3:58 PM Jungtaek Lim <
>>> kabhwan.opensou...@gmail.com> wrote:
>>>
 Hi dev,

 I'd like to start a discussion on "State Data Source - Reader".

 This proposal aims to introduce a new data source "statestore" which
 enables reading the state rows from existing checkpoint via offline (batch)
 query. This will enable users to 1) create unit tests against stateful
 query verifying the state value (especially flatMapGroupsWithState), 2)
 gather more context on the status when an incident occurs, especially for
 incorrect output.

 *SPIP*:
 https://docs.google.com/document/d/1HjEupRv8TRFeULtJuxRq_tEG1Wq-9UNu-ctGgCYRke0/edit?usp=sharing
 *JIRA*: https://issues.apache.org/jira/browse/SPARK-45511

 Looking forward to your feedback!

 Thanks,
 Jungtaek Lim (HeartSaVioR)

 ps. The scope of the project is narrowed to the reader in this SPIP,
 since the writer requires us to consider more cases. We are planning on it.

>>>
>
> --
> Bartosz Konieczny
> freelance data engineer
> https://www.waitingforcode.com
> https://github.com/bartosz25/
> https://twitter.com/waitingforcode
>
>

Re: [DISCUSS] SPIP: State Data Source - Reader

2023-10-16 Thread Bartosz Konieczny

Thank you, Jungtaek, for your answers! It's clear now.

+1 for me. It seems like a prerequisite for further ops-related
improvements for the state store management. I mean especially here the
state rebalancing that could rely on this read+write state store API. I
don't mean here the dynamic state rebalancing that could probably be
implemented with a lower latency directly in the stateful API. Instead I'm
thinking more of an offline job to rebalance the state and later restart
the stateful pipeline with the changed number of shuffle partitions.

Best,
Bartosz.

On Mon, Oct 16, 2023 at 6:19 PM Jungtaek Lim 
wrote:

> bump for better reach
>
> On Thu, Oct 12, 2023 at 4:26 PM Jungtaek Lim 
> wrote:
>
>> Sorry, please use this link instead for SPIP doc:
>> https://docs.google.com/document/d/1_iVf_CIu2RZd3yWWF6KoRNlBiz5NbSIK0yThqG0EvPY/edit?usp=sharing
>>
>>
>> On Thu, Oct 12, 2023 at 3:58 PM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>>
>>> Hi dev,
>>>
>>> I'd like to start a discussion on "State Data Source - Reader".
>>>
>>> This proposal aims to introduce a new data source "statestore" which
>>> enables reading the state rows from existing checkpoint via offline (batch)
>>> query. This will enable users to 1) create unit tests against stateful
>>> query verifying the state value (especially flatMapGroupsWithState), 2)
>>> gather more context on the status when an incident occurs, especially for
>>> incorrect output.
>>>
>>> *SPIP*:
>>> https://docs.google.com/document/d/1HjEupRv8TRFeULtJuxRq_tEG1Wq-9UNu-ctGgCYRke0/edit?usp=sharing
>>> *JIRA*: https://issues.apache.org/jira/browse/SPARK-45511
>>>
>>> Looking forward to your feedback!
>>>
>>> Thanks,
>>> Jungtaek Lim (HeartSaVioR)
>>>
>>> ps. The scope of the project is narrowed to the reader in this SPIP,
>>> since the writer requires us to consider more cases. We are planning on it.
>>>
>>

-- 
Bartosz Konieczny
freelance data engineer
https://www.waitingforcode.com
https://github.com/bartosz25/
https://twitter.com/waitingforcode

Re: [DISCUSS] SPIP: State Data Source - Reader

2023-10-16 Thread Jungtaek Lim

bump for better reach

On Thu, Oct 12, 2023 at 4:26 PM Jungtaek Lim 
wrote:

> Sorry, please use this link instead for SPIP doc:
> https://docs.google.com/document/d/1_iVf_CIu2RZd3yWWF6KoRNlBiz5NbSIK0yThqG0EvPY/edit?usp=sharing
>
>
> On Thu, Oct 12, 2023 at 3:58 PM Jungtaek Lim 
> wrote:
>
>> Hi dev,
>>
>> I'd like to start a discussion on "State Data Source - Reader".
>>
>> This proposal aims to introduce a new data source "statestore" which
>> enables reading the state rows from existing checkpoint via offline (batch)
>> query. This will enable users to 1) create unit tests against stateful
>> query verifying the state value (especially flatMapGroupsWithState), 2)
>> gather more context on the status when an incident occurs, especially for
>> incorrect output.
>>
>> *SPIP*:
>> https://docs.google.com/document/d/1HjEupRv8TRFeULtJuxRq_tEG1Wq-9UNu-ctGgCYRke0/edit?usp=sharing
>> *JIRA*: https://issues.apache.org/jira/browse/SPARK-45511
>>
>> Looking forward to your feedback!
>>
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>>
>> ps. The scope of the project is narrowed to the reader in this SPIP,
>> since the writer requires us to consider more cases. We are planning on it.
>>
>

unsubscribe

2023-10-12 Thread Duy Pham

Re: [DISCUSS] SPIP: State Data Source - Reader

2023-10-12 Thread Jungtaek Lim

Sorry, please use this link instead for SPIP doc:
https://docs.google.com/document/d/1_iVf_CIu2RZd3yWWF6KoRNlBiz5NbSIK0yThqG0EvPY/edit?usp=sharing


On Thu, Oct 12, 2023 at 3:58 PM Jungtaek Lim 
wrote:

> Hi dev,
>
> I'd like to start a discussion on "State Data Source - Reader".
>
> This proposal aims to introduce a new data source "statestore" which
> enables reading the state rows from existing checkpoint via offline (batch)
> query. This will enable users to 1) create unit tests against stateful
> query verifying the state value (especially flatMapGroupsWithState), 2)
> gather more context on the status when an incident occurs, especially for
> incorrect output.
>
> *SPIP*:
> https://docs.google.com/document/d/1HjEupRv8TRFeULtJuxRq_tEG1Wq-9UNu-ctGgCYRke0/edit?usp=sharing
> *JIRA*: https://issues.apache.org/jira/browse/SPARK-45511
>
> Looking forward to your feedback!
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> ps. The scope of the project is narrowed to the reader in this SPIP, since
> the writer requires us to consider more cases. We are planning on it.
>

[DISCUSS] SPIP: State Data Source - Reader

2023-10-12 Thread Jungtaek Lim

Hi dev,

I'd like to start a discussion on "State Data Source - Reader".

This proposal aims to introduce a new data source "statestore" which
enables reading the state rows from existing checkpoint via offline (batch)
query. This will enable users to 1) create unit tests against stateful
query verifying the state value (especially flatMapGroupsWithState), 2)
gather more context on the status when an incident occurs, especially for
incorrect output.

*SPIP*:
https://docs.google.com/document/d/1HjEupRv8TRFeULtJuxRq_tEG1Wq-9UNu-ctGgCYRke0/edit?usp=sharing
*JIRA*: https://issues.apache.org/jira/browse/SPARK-45511

Looking forward to your feedback!

Thanks,
Jungtaek Lim (HeartSaVioR)

ps. The scope of the project is narrowed to the reader in this SPIP, since
the writer requires us to consider more cases. We are planning on it.

Re: Watermark on late data only

2023-10-10 Thread Raghu Angadi

I like some way to expose watermarks to the user. It does affect
the processing of the records, so it is relevant for the users.
`current_watermark()` is a good option.
The implementation of this might be engine specific. But it is a very
relevant concept for authors of streaming pipelines.
Ideally I would like the engine to drop (or write to side output) even for
stateless pipelines for consistency.

On Tue, Oct 10, 2023 at 2:27 AM Bartosz Konieczny 
wrote:

> Thank you for the clarification, Jungtaek  Indeed, it doesn't sound like
> a highly demanded feature from the end users, haven't seen that a lot on
> StackOverflow or mailing lists. I was just curious about the reasons.
>
> Using the arbitrary stateful processing could be indeed a workaround! But
> IMHO it would be easier to expose this watermark value from a function like
> a current_watermark() and let the users do anything with the data. And it
> wouldn't require having the state store overhead to deal with. The function
> could simplify implementing the *side output pattern* where we could
> process the on-time data differently from the late data, e.g. write late
> data to a dedicated space in the lake and facilitate the backfilling for
> the batch pipelines?
>
> With the current_watermark function it could be expressed as a simple:
>
> streamDataset.foreachBatch((dataframe, batchVersion) =>  {
>   dataframe.cache()
>   dataframe.filter(current_watermark() >
> event_time_from_datafarame).writeTo("late_data")
>   dataframe.filter(current_watermark() <=
> event_time_from_datafarame).writeTo("on_time_data")
> })
>
> A little bit as you can do with Apache Flink in fact:
>
> https://github.com/immerok/recipes/blob/main/late-data-to-sink/src/main/java/com/immerok/cookbook/LateDataToSeparateSink.java#L81
>
> WDYT?
>
> Best,
> Bartosz.
>
> PS. Will be happy to contribute on that if the feature does make sense ;)
>
> On Tue, Oct 10, 2023 at 3:23 AM Jungtaek Lim 
> wrote:
>
>> Technically speaking, "late data" represents the data which cannot be
>> processed due to the fact the engine threw out the state associated with
>> the data already.
>>
>> That said, the only reason watermark does exist for streaming is to
>> handle stateful operators. From the engine's point of view, there is no
>> concept about "late data" for stateless query. It's something users have to
>> leverage "filter" by themselves, without relying on the value of watermark.
>> I guess someone may see some benefit of automatic tracking of trend for
>> event time and want to define late data based on the watermark even in
>> stateless query, but personally I don't hear about the request so far.
>>
>> As a workaround you can leverage flatMapGroupsWithState which provides
>> the value of watermark for you, but I'd agree it's too heavyweight just to
>> do this. If we see consistent demand on it, we could probably look into it
>> and maybe introduce a new SQL function (which works only on streaming -
>> that's probably a major blocker on introduction) on it.
>>
>> On Mon, Oct 9, 2023 at 11:03 AM Bartosz Konieczny <
>> bartkoniec...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I've been analyzing the watermark propagation added in the 3.5.0
>>> recently and had to return to the basics of watermarks. One question is
>>> still unanswered in my head.
>>>
>>> Why are the watermarks reserved to stateful queries? Can't they apply to
>>> the filtering late date out only?
>>>
>>> The reason is only historical, as the initial design doc
>>> 
>>> mentions the aggregated queries exclusively? Or are there any technical
>>> limitations why writing the jobs like below don't drop late data
>>> automatically?
>>>
>>> import sparkSession.implicits._
>>> implicit val sparkContext = sparkSession.sqlContext
>>> val clicksStream = MemoryStream[Click]
>>> val clicksWithWatermark = clicksStream.toDF
>>>   .withWatermark("clickTime", "10 minutes")
>>> val query =
>>> clicksWithWatermark.writeStream.format("console").option("truncate", false)
>>>   .start()
>>>
>>> clicksStream.addData(Seq(
>>>   Click(1, Timestamp.valueOf("2023-06-10 10:10:00")),
>>>   Click(2, Timestamp.valueOf("2023-06-10 10:12:00")),
>>>   Click(3, Timestamp.valueOf("2023-06-10 10:14:00"))
>>> ))
>>>
>>>
>>> query.processAllAvailable()
>>>
>>> clicksStream.addData(Seq(
>>>   Click(4, Timestamp.valueOf("2023-06-10 11:00:40")),
>>>   Click(5, Timestamp.valueOf("2023-06-10 11:00:30")),
>>>   Click(6, Timestamp.valueOf("2023-06-10 11:00:10")),
>>>   Click(10, Timestamp.valueOf("2023-06-10 10:00:10"))
>>> ))
>>> query.processAllAvailable()
>>>
>>> One quick implementation could be adding a new physical plan rule to the
>>> IncrementalExecution
>>> 
>>> for the

Re: Watermark on late data only

2023-10-10 Thread Jungtaek Lim

slight correction/clarification: We now take the "previous" watermark to
determine the late record, because they are valid inputs for non-first
stateful operators dropping records based on the same criteria would drop
valid records from previous (upstream) stateful operators. Please look back
which criteria we use for evicting states, which could become outputs of
the operator.

On Tue, Oct 10, 2023 at 8:10 PM Jungtaek Lim 
wrote:

> We wouldn't like to expose the internal mechanism to the public.
>
> As you are a very detail oriented engineer tracking major changes, you
> might notice that we "changed" the definition of late record while fixing
> late records. Previously the late record is defined as a record having
> event time timestamp be earlier than the "current" watermark. How has it
> changed? We now take the "previous" watermark to determine the late record,
> because they are valid inputs for non-first stateful operators. If we were
> exposing the function current_watermark() which provides current watermark
> and users somehow build a side-output based on this, it would be broken
> when we introduce the fix on late record filtering. Or even worse, we may
> decide not to fix the issue worrying too much about existing workloads, and
> give up multiple stateful operators.
>
> The change is arguably not a breaking change, because we never guarantee
> that we won't process the data which is earlier than the watermark. The
> guarantee is one way, we guarantee that the record is processed if the
> event time of the record is later than the watermark. The opposite way is
> not guaranteed, and we actually documented this in the guide doc.
>
> So the workaround I mentioned cannot be used for capturing dropped late
> records - that does not work as expected. We will need to apply exactly the
> same criteria (probably the same predicate) on capturing them. We are aware
> of the demand for side-output of dropped late records, and I also agree
> that just having numbers of dropped records is never ideal.
>
> Let's see whether we have an opportunity to prioritize this. If you have
> an idea (sketched design) for implementing this, that should be awesome!
>
> On Tue, Oct 10, 2023 at 6:27 PM Bartosz Konieczny 
> wrote:
>
>> Thank you for the clarification, Jungtaek  Indeed, it doesn't sound
>> like a highly demanded feature from the end users, haven't seen that a lot
>> on StackOverflow or mailing lists. I was just curious about the reasons.
>>
>> Using the arbitrary stateful processing could be indeed a workaround! But
>> IMHO it would be easier to expose this watermark value from a function like
>> a current_watermark() and let the users do anything with the data. And
>> it wouldn't require having the state store overhead to deal with. The
>> function could simplify implementing the *side output pattern* where we
>> could process the on-time data differently from the late data, e.g. write
>> late data to a dedicated space in the lake and facilitate the backfilling
>> for the batch pipelines?
>>
>> With the current_watermark function it could be expressed as a simple:
>>
>> streamDataset.foreachBatch((dataframe, batchVersion) =>  {
>>   dataframe.cache()
>>   dataframe.filter(current_watermark() >
>> event_time_from_datafarame).writeTo("late_data")
>>   dataframe.filter(current_watermark() <=
>> event_time_from_datafarame).writeTo("on_time_data")
>> })
>>
>> A little bit as you can do with Apache Flink in fact:
>>
>> https://github.com/immerok/recipes/blob/main/late-data-to-sink/src/main/java/com/immerok/cookbook/LateDataToSeparateSink.java#L81
>>
>> WDYT?
>>
>> Best,
>> Bartosz.
>>
>> PS. Will be happy to contribute on that if the feature does make sense ;)
>>
>> On Tue, Oct 10, 2023 at 3:23 AM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>>
>>> Technically speaking, "late data" represents the data which cannot be
>>> processed due to the fact the engine threw out the state associated with
>>> the data already.
>>>
>>> That said, the only reason watermark does exist for streaming is to
>>> handle stateful operators. From the engine's point of view, there is no
>>> concept about "late data" for stateless query. It's something users have to
>>> leverage "filter" by themselves, without relying on the value of watermark.
>>> I guess someone may see some benefit of automatic tracking of trend for
>>> event time and want to define late data based on the watermark even in
>>> stateless query, but personally I don't hear about the request so far.
>>>
>>> As a workaround you can leverage flatMapGroupsWithState which provides
>>> the value of watermark for you, but I'd agree it's too heavyweight just to
>>> do this. If we see consistent demand on it, we could probably look into it
>>> and maybe introduce a new SQL function (which works only on streaming -
>>> that's probably a major blocker on introduction) on it.
>>>
>>> On Mon, Oct 9, 2023 at 11:03 AM Bartosz Konieczny <
>>> bartkoniec...@gmail.com> wrote:

Re: Watermark on late data only

2023-10-10 Thread Jungtaek Lim

We wouldn't like to expose the internal mechanism to the public.

As you are a very detail oriented engineer tracking major changes, you
might notice that we "changed" the definition of late record while fixing
late records. Previously the late record is defined as a record having
event time timestamp be earlier than the "current" watermark. How has it
changed? We now take the "previous" watermark to determine the late record,
because they are valid inputs for non-first stateful operators. If we were
exposing the function current_watermark() which provides current watermark
and users somehow build a side-output based on this, it would be broken
when we introduce the fix on late record filtering. Or even worse, we may
decide not to fix the issue worrying too much about existing workloads, and
give up multiple stateful operators.

The change is arguably not a breaking change, because we never guarantee
that we won't process the data which is earlier than the watermark. The
guarantee is one way, we guarantee that the record is processed if the
event time of the record is later than the watermark. The opposite way is
not guaranteed, and we actually documented this in the guide doc.

So the workaround I mentioned cannot be used for capturing dropped late
records - that does not work as expected. We will need to apply exactly the
same criteria (probably the same predicate) on capturing them. We are aware
of the demand for side-output of dropped late records, and I also agree
that just having numbers of dropped records is never ideal.

Let's see whether we have an opportunity to prioritize this. If you have an
idea (sketched design) for implementing this, that should be awesome!

On Tue, Oct 10, 2023 at 6:27 PM Bartosz Konieczny 
wrote:

> Thank you for the clarification, Jungtaek  Indeed, it doesn't sound like
> a highly demanded feature from the end users, haven't seen that a lot on
> StackOverflow or mailing lists. I was just curious about the reasons.
>
> Using the arbitrary stateful processing could be indeed a workaround! But
> IMHO it would be easier to expose this watermark value from a function like
> a current_watermark() and let the users do anything with the data. And it
> wouldn't require having the state store overhead to deal with. The function
> could simplify implementing the *side output pattern* where we could
> process the on-time data differently from the late data, e.g. write late
> data to a dedicated space in the lake and facilitate the backfilling for
> the batch pipelines?
>
> With the current_watermark function it could be expressed as a simple:
>
> streamDataset.foreachBatch((dataframe, batchVersion) =>  {
>   dataframe.cache()
>   dataframe.filter(current_watermark() >
> event_time_from_datafarame).writeTo("late_data")
>   dataframe.filter(current_watermark() <=
> event_time_from_datafarame).writeTo("on_time_data")
> })
>
> A little bit as you can do with Apache Flink in fact:
>
> https://github.com/immerok/recipes/blob/main/late-data-to-sink/src/main/java/com/immerok/cookbook/LateDataToSeparateSink.java#L81
>
> WDYT?
>
> Best,
> Bartosz.
>
> PS. Will be happy to contribute on that if the feature does make sense ;)
>
> On Tue, Oct 10, 2023 at 3:23 AM Jungtaek Lim 
> wrote:
>
>> Technically speaking, "late data" represents the data which cannot be
>> processed due to the fact the engine threw out the state associated with
>> the data already.
>>
>> That said, the only reason watermark does exist for streaming is to
>> handle stateful operators. From the engine's point of view, there is no
>> concept about "late data" for stateless query. It's something users have to
>> leverage "filter" by themselves, without relying on the value of watermark.
>> I guess someone may see some benefit of automatic tracking of trend for
>> event time and want to define late data based on the watermark even in
>> stateless query, but personally I don't hear about the request so far.
>>
>> As a workaround you can leverage flatMapGroupsWithState which provides
>> the value of watermark for you, but I'd agree it's too heavyweight just to
>> do this. If we see consistent demand on it, we could probably look into it
>> and maybe introduce a new SQL function (which works only on streaming -
>> that's probably a major blocker on introduction) on it.
>>
>> On Mon, Oct 9, 2023 at 11:03 AM Bartosz Konieczny <
>> bartkoniec...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I've been analyzing the watermark propagation added in the 3.5.0
>>> recently and had to return to the basics of watermarks. One question is
>>> still unanswered in my head.
>>>
>>> Why are the watermarks reserved to stateful queries? Can't they apply to
>>> the filtering late date out only?
>>>
>>> The reason is only historical, as the initial design doc
>>> 
>>> mentions the aggregated queries exclusively? Or are there any technical
>>> limitations why writing the jobs

Re: Watermark on late data only

2023-10-10 Thread Bartosz Konieczny

Thank you for the clarification, Jungtaek  Indeed, it doesn't sound like
a highly demanded feature from the end users, haven't seen that a lot on
StackOverflow or mailing lists. I was just curious about the reasons.

Using the arbitrary stateful processing could be indeed a workaround! But
IMHO it would be easier to expose this watermark value from a function like
a current_watermark() and let the users do anything with the data. And it
wouldn't require having the state store overhead to deal with. The function
could simplify implementing the *side output pattern* where we could
process the on-time data differently from the late data, e.g. write late
data to a dedicated space in the lake and facilitate the backfilling for
the batch pipelines?

With the current_watermark function it could be expressed as a simple:

streamDataset.foreachBatch((dataframe, batchVersion) =>  {
  dataframe.cache()
  dataframe.filter(current_watermark() >
event_time_from_datafarame).writeTo("late_data")
  dataframe.filter(current_watermark() <=
event_time_from_datafarame).writeTo("on_time_data")
})

A little bit as you can do with Apache Flink in fact:
https://github.com/immerok/recipes/blob/main/late-data-to-sink/src/main/java/com/immerok/cookbook/LateDataToSeparateSink.java#L81

WDYT?

Best,
Bartosz.

PS. Will be happy to contribute on that if the feature does make sense ;)

On Tue, Oct 10, 2023 at 3:23 AM Jungtaek Lim 
wrote:

> Technically speaking, "late data" represents the data which cannot be
> processed due to the fact the engine threw out the state associated with
> the data already.
>
> That said, the only reason watermark does exist for streaming is to handle
> stateful operators. From the engine's point of view, there is no concept
> about "late data" for stateless query. It's something users have to
> leverage "filter" by themselves, without relying on the value of watermark.
> I guess someone may see some benefit of automatic tracking of trend for
> event time and want to define late data based on the watermark even in
> stateless query, but personally I don't hear about the request so far.
>
> As a workaround you can leverage flatMapGroupsWithState which provides the
> value of watermark for you, but I'd agree it's too heavyweight just to do
> this. If we see consistent demand on it, we could probably look into it and
> maybe introduce a new SQL function (which works only on streaming - that's
> probably a major blocker on introduction) on it.
>
> On Mon, Oct 9, 2023 at 11:03 AM Bartosz Konieczny 
> wrote:
>
>> Hi,
>>
>> I've been analyzing the watermark propagation added in the 3.5.0 recently
>> and had to return to the basics of watermarks. One question is still
>> unanswered in my head.
>>
>> Why are the watermarks reserved to stateful queries? Can't they apply to
>> the filtering late date out only?
>>
>> The reason is only historical, as the initial design doc
>> 
>> mentions the aggregated queries exclusively? Or are there any technical
>> limitations why writing the jobs like below don't drop late data
>> automatically?
>>
>> import sparkSession.implicits._
>> implicit val sparkContext = sparkSession.sqlContext
>> val clicksStream = MemoryStream[Click]
>> val clicksWithWatermark = clicksStream.toDF
>>   .withWatermark("clickTime", "10 minutes")
>> val query =
>> clicksWithWatermark.writeStream.format("console").option("truncate", false)
>>   .start()
>>
>> clicksStream.addData(Seq(
>>   Click(1, Timestamp.valueOf("2023-06-10 10:10:00")),
>>   Click(2, Timestamp.valueOf("2023-06-10 10:12:00")),
>>   Click(3, Timestamp.valueOf("2023-06-10 10:14:00"))
>> ))
>>
>>
>> query.processAllAvailable()
>>
>> clicksStream.addData(Seq(
>>   Click(4, Timestamp.valueOf("2023-06-10 11:00:40")),
>>   Click(5, Timestamp.valueOf("2023-06-10 11:00:30")),
>>   Click(6, Timestamp.valueOf("2023-06-10 11:00:10")),
>>   Click(10, Timestamp.valueOf("2023-06-10 10:00:10"))
>> ))
>> query.processAllAvailable()
>>
>> One quick implementation could be adding a new physical plan rule to the
>> IncrementalExecution
>> 
>> for the EventTimeWatermark node. That's a first thought, maybe too
>> simplistic and hiding some pitfalls?
>>
>> Best,
>> Bartosz.
>> --
>> freelance data engineer
>> https://www.waitingforcode.com
>> https://github.com/bartosz25/
>> https://twitter.com/waitingforcode
>>
>>

-- 
Bartosz Konieczny
freelance data engineer
https://www.waitingforcode.com
https://github.com/bartosz25/
https://twitter.com/waitingforcode

Delimited identifiers.

2023-10-10 Thread Virgil Artimon Palanciuc

Apologies if this has been discussed before, I searched but couldn’t find it.
What is the rationale behind picking backticks for identifier delimiters in 
spark?
In the [SQL 92 spec]( 
https://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt), the delimited 
identifier is unambiguously defined to use double quotes; and lots of 
databases/data warehouses accept this syntax:

::=


  ::= ...

  ::=

  | 

  ::= !! See the Syntax Rules

  ::= 

Is there any plan to accept double-quote delimited identifiers, at least when 
ansi mode is turned on?

Regards,
Virgil.

Re: Watermark on late data only

2023-10-09 Thread Jungtaek Lim

Technically speaking, "late data" represents the data which cannot be
processed due to the fact the engine threw out the state associated with
the data already.

That said, the only reason watermark does exist for streaming is to handle
stateful operators. From the engine's point of view, there is no concept
about "late data" for stateless query. It's something users have to
leverage "filter" by themselves, without relying on the value of watermark.
I guess someone may see some benefit of automatic tracking of trend for
event time and want to define late data based on the watermark even in
stateless query, but personally I don't hear about the request so far.

As a workaround you can leverage flatMapGroupsWithState which provides the
value of watermark for you, but I'd agree it's too heavyweight just to do
this. If we see consistent demand on it, we could probably look into it and
maybe introduce a new SQL function (which works only on streaming - that's
probably a major blocker on introduction) on it.

On Mon, Oct 9, 2023 at 11:03 AM Bartosz Konieczny 
wrote:

> Hi,
>
> I've been analyzing the watermark propagation added in the 3.5.0 recently
> and had to return to the basics of watermarks. One question is still
> unanswered in my head.
>
> Why are the watermarks reserved to stateful queries? Can't they apply to
> the filtering late date out only?
>
> The reason is only historical, as the initial design doc
> 
> mentions the aggregated queries exclusively? Or are there any technical
> limitations why writing the jobs like below don't drop late data
> automatically?
>
> import sparkSession.implicits._
> implicit val sparkContext = sparkSession.sqlContext
> val clicksStream = MemoryStream[Click]
> val clicksWithWatermark = clicksStream.toDF
>   .withWatermark("clickTime", "10 minutes")
> val query =
> clicksWithWatermark.writeStream.format("console").option("truncate", false)
>   .start()
>
> clicksStream.addData(Seq(
>   Click(1, Timestamp.valueOf("2023-06-10 10:10:00")),
>   Click(2, Timestamp.valueOf("2023-06-10 10:12:00")),
>   Click(3, Timestamp.valueOf("2023-06-10 10:14:00"))
> ))
>
>
> query.processAllAvailable()
>
> clicksStream.addData(Seq(
>   Click(4, Timestamp.valueOf("2023-06-10 11:00:40")),
>   Click(5, Timestamp.valueOf("2023-06-10 11:00:30")),
>   Click(6, Timestamp.valueOf("2023-06-10 11:00:10")),
>   Click(10, Timestamp.valueOf("2023-06-10 10:00:10"))
> ))
> query.processAllAvailable()
>
> One quick implementation could be adding a new physical plan rule to the
> IncrementalExecution
> 
> for the EventTimeWatermark node. That's a first thought, maybe too
> simplistic and hiding some pitfalls?
>
> Best,
> Bartosz.
> --
> freelance data engineer
> https://www.waitingforcode.com
> https://github.com/bartosz25/
> https://twitter.com/waitingforcode
>
>

Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-09 Thread Xinrong Meng

Congratulations!

On Mon, Oct 9, 2023 at 5:06 AM Kent Yao  wrote:

> Congrats!
>
> Kent
>
>
> 在 2023年10月7日星期六，John Zhuge  写道：
>
>> Congratulations!
>>
>> On Fri, Oct 6, 2023 at 6:41 PM Yi Wu 
>> wrote:
>>
>>> Congrats!
>>>
>>> On Sat, Oct 7, 2023 at 9:24 AM XiDuo You  wrote:
>>>
 Congratulations!

 Prashant Sharma  于2023年10月6日周五 00:26写道：
 >
 > Congratulations 
 >
 > On Wed, 4 Oct, 2023, 8:52 pm huaxin gao, 
 wrote:
 >>
 >> Congratulations!
 >>
 >> On Wed, Oct 4, 2023 at 7:39 AM Chao Sun  wrote:
 >>>
 >>> Congratulations!
 >>>
 >>> On Wed, Oct 4, 2023 at 5:11 AM Jungtaek Lim <
 kabhwan.opensou...@gmail.com> wrote:
 
  Congrats!
 
  2023년 10월 4일 (수) 오후 5:04, yangjie01 님이
 작성:
 >
 > Congratulations!
 >
 >
 >
 > Jie Yang
 >
 >
 >
 > 发件人: Dongjoon Hyun 
 > 日期: 2023年10月4日 星期三 13:04
 > 收件人: Hyukjin Kwon 
 > 抄送: Hussein Awala , Rui Wang <
 amaliu...@apache.org>, Gengliang Wang , Xiao Li <
 gatorsm...@gmail.com>, "dev@spark.apache.org" 
 > 主题: Re: Welcome to Our New Apache Spark Committer and PMCs
 >
 >
 >
 > Congratulations!
 >
 >
 >
 > Dongjoon.
 >
 >
 >
 > On Tue, Oct 3, 2023 at 5:25 PM Hyukjin Kwon 
 wrote:
 >
 > Woohoo!
 >
 >
 >
 > On Tue, 3 Oct 2023 at 22:47, Hussein Awala 
 wrote:
 >
 > Congrats to all of you!
 >
 >
 >
 > On Tue 3 Oct 2023 at 08:15, Rui Wang 
 wrote:
 >
 > Congratulations! Well deserved!
 >
 >
 >
 > -Rui
 >
 >
 >
 >
 >
 > On Mon, Oct 2, 2023 at 10:32 PM Gengliang Wang 
 wrote:
 >
 > Congratulations to all! Well deserved!
 >
 >
 >
 > On Mon, Oct 2, 2023 at 10:16 PM Xiao Li 
 wrote:
 >
 > Hi all,
 >
 > The Spark PMC is delighted to announce that we have voted to add
 one new committer and two new PMC members. These individuals have
 consistently contributed to the project and have clearly demonstrated their
 expertise.
 >
 > New Committer:
 > - Jiaan Geng (focusing on Spark Connect and Spark SQL)
 >
 > New PMCs:
 > - Yuanjian Li
 > - Yikun Jiang
 >
 > Please join us in extending a warm welcome to them in their new
 roles!
 >
 > Sincerely,
 > The Spark PMC

 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-09 Thread Kent Yao

Congrats!

Kent


在 2023年10月7日星期六，John Zhuge  写道：

> Congratulations!
>
> On Fri, Oct 6, 2023 at 6:41 PM Yi Wu  wrote:
>
>> Congrats!
>>
>> On Sat, Oct 7, 2023 at 9:24 AM XiDuo You  wrote:
>>
>>> Congratulations!
>>>
>>> Prashant Sharma  于2023年10月6日周五 00:26写道：
>>> >
>>> > Congratulations 
>>> >
>>> > On Wed, 4 Oct, 2023, 8:52 pm huaxin gao, 
>>> wrote:
>>> >>
>>> >> Congratulations!
>>> >>
>>> >> On Wed, Oct 4, 2023 at 7:39 AM Chao Sun  wrote:
>>> >>>
>>> >>> Congratulations!
>>> >>>
>>> >>> On Wed, Oct 4, 2023 at 5:11 AM Jungtaek Lim <
>>> kabhwan.opensou...@gmail.com> wrote:
>>> 
>>>  Congrats!
>>> 
>>>  2023년 10월 4일 (수) 오후 5:04, yangjie01 님이
>>> 작성:
>>> >
>>> > Congratulations!
>>> >
>>> >
>>> >
>>> > Jie Yang
>>> >
>>> >
>>> >
>>> > 发件人: Dongjoon Hyun 
>>> > 日期: 2023年10月4日 星期三 13:04
>>> > 收件人: Hyukjin Kwon 
>>> > 抄送: Hussein Awala , Rui Wang <
>>> amaliu...@apache.org>, Gengliang Wang , Xiao Li <
>>> gatorsm...@gmail.com>, "dev@spark.apache.org" 
>>> > 主题: Re: Welcome to Our New Apache Spark Committer and PMCs
>>> >
>>> >
>>> >
>>> > Congratulations!
>>> >
>>> >
>>> >
>>> > Dongjoon.
>>> >
>>> >
>>> >
>>> > On Tue, Oct 3, 2023 at 5:25 PM Hyukjin Kwon 
>>> wrote:
>>> >
>>> > Woohoo!
>>> >
>>> >
>>> >
>>> > On Tue, 3 Oct 2023 at 22:47, Hussein Awala 
>>> wrote:
>>> >
>>> > Congrats to all of you!
>>> >
>>> >
>>> >
>>> > On Tue 3 Oct 2023 at 08:15, Rui Wang  wrote:
>>> >
>>> > Congratulations! Well deserved!
>>> >
>>> >
>>> >
>>> > -Rui
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > On Mon, Oct 2, 2023 at 10:32 PM Gengliang Wang 
>>> wrote:
>>> >
>>> > Congratulations to all! Well deserved!
>>> >
>>> >
>>> >
>>> > On Mon, Oct 2, 2023 at 10:16 PM Xiao Li 
>>> wrote:
>>> >
>>> > Hi all,
>>> >
>>> > The Spark PMC is delighted to announce that we have voted to add
>>> one new committer and two new PMC members. These individuals have
>>> consistently contributed to the project and have clearly demonstrated their
>>> expertise.
>>> >
>>> > New Committer:
>>> > - Jiaan Geng (focusing on Spark Connect and Spark SQL)
>>> >
>>> > New PMCs:
>>> > - Yuanjian Li
>>> > - Yikun Jiang
>>> >
>>> > Please join us in extending a warm welcome to them in their new
>>> roles!
>>> >
>>> > Sincerely,
>>> > The Spark PMC
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>

Watermark on late data only

2023-10-08 Thread Bartosz Konieczny

Hi,

I've been analyzing the watermark propagation added in the 3.5.0 recently
and had to return to the basics of watermarks. One question is still
unanswered in my head.

Why are the watermarks reserved to stateful queries? Can't they apply to
the filtering late date out only?

The reason is only historical, as the initial design doc

mentions the aggregated queries exclusively? Or are there any technical
limitations why writing the jobs like below don't drop late data
automatically?

import sparkSession.implicits._
implicit val sparkContext = sparkSession.sqlContext
val clicksStream = MemoryStream[Click]
val clicksWithWatermark = clicksStream.toDF
  .withWatermark("clickTime", "10 minutes")
val query =
clicksWithWatermark.writeStream.format("console").option("truncate", false)
  .start()

clicksStream.addData(Seq(
  Click(1, Timestamp.valueOf("2023-06-10 10:10:00")),
  Click(2, Timestamp.valueOf("2023-06-10 10:12:00")),
  Click(3, Timestamp.valueOf("2023-06-10 10:14:00"))
))


query.processAllAvailable()

clicksStream.addData(Seq(
  Click(4, Timestamp.valueOf("2023-06-10 11:00:40")),
  Click(5, Timestamp.valueOf("2023-06-10 11:00:30")),
  Click(6, Timestamp.valueOf("2023-06-10 11:00:10")),
  Click(10, Timestamp.valueOf("2023-06-10 10:00:10"))
))
query.processAllAvailable()

One quick implementation could be adding a new physical plan rule to the
IncrementalExecution

for the EventTimeWatermark node. That's a first thought, maybe too
simplistic and hiding some pitfalls?

Best,
Bartosz.
-- 
freelance data engineer
https://www.waitingforcode.com
https://github.com/bartosz25/
https://twitter.com/waitingforcode

Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-06 Thread John Zhuge

Congratulations!

On Fri, Oct 6, 2023 at 6:41 PM Yi Wu  wrote:

> Congrats!
>
> On Sat, Oct 7, 2023 at 9:24 AM XiDuo You  wrote:
>
>> Congratulations!
>>
>> Prashant Sharma  于2023年10月6日周五 00:26写道：
>> >
>> > Congratulations 
>> >
>> > On Wed, 4 Oct, 2023, 8:52 pm huaxin gao, 
>> wrote:
>> >>
>> >> Congratulations!
>> >>
>> >> On Wed, Oct 4, 2023 at 7:39 AM Chao Sun  wrote:
>> >>>
>> >>> Congratulations!
>> >>>
>> >>> On Wed, Oct 4, 2023 at 5:11 AM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>> 
>>  Congrats!
>> 
>>  2023년 10월 4일 (수) 오후 5:04, yangjie01 님이
>> 작성:
>> >
>> > Congratulations!
>> >
>> >
>> >
>> > Jie Yang
>> >
>> >
>> >
>> > 发件人: Dongjoon Hyun 
>> > 日期: 2023年10月4日 星期三 13:04
>> > 收件人: Hyukjin Kwon 
>> > 抄送: Hussein Awala , Rui Wang <
>> amaliu...@apache.org>, Gengliang Wang , Xiao Li <
>> gatorsm...@gmail.com>, "dev@spark.apache.org" 
>> > 主题: Re: Welcome to Our New Apache Spark Committer and PMCs
>> >
>> >
>> >
>> > Congratulations!
>> >
>> >
>> >
>> > Dongjoon.
>> >
>> >
>> >
>> > On Tue, Oct 3, 2023 at 5:25 PM Hyukjin Kwon 
>> wrote:
>> >
>> > Woohoo!
>> >
>> >
>> >
>> > On Tue, 3 Oct 2023 at 22:47, Hussein Awala 
>> wrote:
>> >
>> > Congrats to all of you!
>> >
>> >
>> >
>> > On Tue 3 Oct 2023 at 08:15, Rui Wang  wrote:
>> >
>> > Congratulations! Well deserved!
>> >
>> >
>> >
>> > -Rui
>> >
>> >
>> >
>> >
>> >
>> > On Mon, Oct 2, 2023 at 10:32 PM Gengliang Wang 
>> wrote:
>> >
>> > Congratulations to all! Well deserved!
>> >
>> >
>> >
>> > On Mon, Oct 2, 2023 at 10:16 PM Xiao Li 
>> wrote:
>> >
>> > Hi all,
>> >
>> > The Spark PMC is delighted to announce that we have voted to add
>> one new committer and two new PMC members. These individuals have
>> consistently contributed to the project and have clearly demonstrated their
>> expertise.
>> >
>> > New Committer:
>> > - Jiaan Geng (focusing on Spark Connect and Spark SQL)
>> >
>> > New PMCs:
>> > - Yuanjian Li
>> > - Yikun Jiang
>> >
>> > Please join us in extending a warm welcome to them in their new
>> roles!
>> >
>> > Sincerely,
>> > The Spark PMC
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>

Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-06 Thread Yi Wu

Congrats!

On Sat, Oct 7, 2023 at 9:24 AM XiDuo You  wrote:

> Congratulations!
>
> Prashant Sharma  于2023年10月6日周五 00:26写道：
> >
> > Congratulations 
> >
> > On Wed, 4 Oct, 2023, 8:52 pm huaxin gao,  wrote:
> >>
> >> Congratulations!
> >>
> >> On Wed, Oct 4, 2023 at 7:39 AM Chao Sun  wrote:
> >>>
> >>> Congratulations!
> >>>
> >>> On Wed, Oct 4, 2023 at 5:11 AM Jungtaek Lim <
> kabhwan.opensou...@gmail.com> wrote:
> 
>  Congrats!
> 
>  2023년 10월 4일 (수) 오후 5:04, yangjie01 님이
> 작성:
> >
> > Congratulations!
> >
> >
> >
> > Jie Yang
> >
> >
> >
> > 发件人: Dongjoon Hyun 
> > 日期: 2023年10月4日 星期三 13:04
> > 收件人: Hyukjin Kwon 
> > 抄送: Hussein Awala , Rui Wang ,
> Gengliang Wang , Xiao Li , "
> dev@spark.apache.org" 
> > 主题: Re: Welcome to Our New Apache Spark Committer and PMCs
> >
> >
> >
> > Congratulations!
> >
> >
> >
> > Dongjoon.
> >
> >
> >
> > On Tue, Oct 3, 2023 at 5:25 PM Hyukjin Kwon 
> wrote:
> >
> > Woohoo!
> >
> >
> >
> > On Tue, 3 Oct 2023 at 22:47, Hussein Awala  wrote:
> >
> > Congrats to all of you!
> >
> >
> >
> > On Tue 3 Oct 2023 at 08:15, Rui Wang  wrote:
> >
> > Congratulations! Well deserved!
> >
> >
> >
> > -Rui
> >
> >
> >
> >
> >
> > On Mon, Oct 2, 2023 at 10:32 PM Gengliang Wang 
> wrote:
> >
> > Congratulations to all! Well deserved!
> >
> >
> >
> > On Mon, Oct 2, 2023 at 10:16 PM Xiao Li 
> wrote:
> >
> > Hi all,
> >
> > The Spark PMC is delighted to announce that we have voted to add one
> new committer and two new PMC members. These individuals have consistently
> contributed to the project and have clearly demonstrated their expertise.
> >
> > New Committer:
> > - Jiaan Geng (focusing on Spark Connect and Spark SQL)
> >
> > New PMCs:
> > - Yuanjian Li
> > - Yikun Jiang
> >
> > Please join us in extending a warm welcome to them in their new
> roles!
> >
> > Sincerely,
> > The Spark PMC
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-06 Thread XiDuo You

Congratulations!

Prashant Sharma  于2023年10月6日周五 00:26写道：
>
> Congratulations 
>
> On Wed, 4 Oct, 2023, 8:52 pm huaxin gao,  wrote:
>>
>> Congratulations!
>>
>> On Wed, Oct 4, 2023 at 7:39 AM Chao Sun  wrote:
>>>
>>> Congratulations!
>>>
>>> On Wed, Oct 4, 2023 at 5:11 AM Jungtaek Lim  
>>> wrote:

 Congrats!

 2023년 10월 4일 (수) 오후 5:04, yangjie01 님이 작성:
>
> Congratulations!
>
>
>
> Jie Yang
>
>
>
> 发件人: Dongjoon Hyun 
> 日期: 2023年10月4日 星期三 13:04
> 收件人: Hyukjin Kwon 
> 抄送: Hussein Awala , Rui Wang , 
> Gengliang Wang , Xiao Li , 
> "dev@spark.apache.org" 
> 主题: Re: Welcome to Our New Apache Spark Committer and PMCs
>
>
>
> Congratulations!
>
>
>
> Dongjoon.
>
>
>
> On Tue, Oct 3, 2023 at 5:25 PM Hyukjin Kwon  wrote:
>
> Woohoo!
>
>
>
> On Tue, 3 Oct 2023 at 22:47, Hussein Awala  wrote:
>
> Congrats to all of you!
>
>
>
> On Tue 3 Oct 2023 at 08:15, Rui Wang  wrote:
>
> Congratulations! Well deserved!
>
>
>
> -Rui
>
>
>
>
>
> On Mon, Oct 2, 2023 at 10:32 PM Gengliang Wang  wrote:
>
> Congratulations to all! Well deserved!
>
>
>
> On Mon, Oct 2, 2023 at 10:16 PM Xiao Li  wrote:
>
> Hi all,
>
> The Spark PMC is delighted to announce that we have voted to add one new 
> committer and two new PMC members. These individuals have consistently 
> contributed to the project and have clearly demonstrated their expertise.
>
> New Committer:
> - Jiaan Geng (focusing on Spark Connect and Spark SQL)
>
> New PMCs:
> - Yuanjian Li
> - Yikun Jiang
>
> Please join us in extending a warm welcome to them in their new roles!
>
> Sincerely,
> The Spark PMC

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-05 Thread Prashant Sharma

Congratulations 

On Wed, 4 Oct, 2023, 8:52 pm huaxin gao,  wrote:

> Congratulations!
>
> On Wed, Oct 4, 2023 at 7:39 AM Chao Sun  wrote:
>
>> Congratulations!
>>
>> On Wed, Oct 4, 2023 at 5:11 AM Jungtaek Lim 
>> wrote:
>>
>>> Congrats!
>>>
>>> 2023년 10월 4일 (수) 오후 5:04, yangjie01 님이 작성:
>>>
 Congratulations!

 Jie Yang

 *发件人**: *Dongjoon Hyun 
 *日期**: *2023年10月4日 星期三 13:04
 *收件人**: *Hyukjin Kwon 
 *抄送**: *Hussein Awala , Rui Wang <
 amaliu...@apache.org>, Gengliang Wang , Xiao Li <
 gatorsm...@gmail.com>, "dev@spark.apache.org" 
 *主题**: *Re: Welcome to Our New Apache Spark Committer and PMCs

 Congratulations!

 Dongjoon.

 On Tue, Oct 3, 2023 at 5:25 PM Hyukjin Kwon 
 wrote:

 Woohoo!

 On Tue, 3 Oct 2023 at 22:47, Hussein Awala  wrote:

 Congrats to all of you!

 On Tue 3 Oct 2023 at 08:15, Rui Wang  wrote:

 Congratulations! Well deserved!

 -Rui

 On Mon, Oct 2, 2023 at 10:32 PM Gengliang Wang 
 wrote:

 Congratulations to all! Well deserved!

 On Mon, Oct 2, 2023 at 10:16 PM Xiao Li  wrote:

 Hi all,

 The Spark PMC is delighted to announce that we have voted to add one
 new committer and two new PMC members. These individuals have consistently
 contributed to the project and have clearly demonstrated their expertise.

 New Committer:
 - Jiaan Geng (focusing on Spark Connect and Spark SQL)

 New PMCs:
 - Yuanjian Li
 - Yikun Jiang

 Please join us in extending a warm welcome to them in their new roles!

 Sincerely,
 The Spark PMC

Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-04 Thread huaxin gao

Congratulations!

On Wed, Oct 4, 2023 at 7:39 AM Chao Sun  wrote:

> Congratulations!
>
> On Wed, Oct 4, 2023 at 5:11 AM Jungtaek Lim 
> wrote:
>
>> Congrats!
>>
>> 2023년 10월 4일 (수) 오후 5:04, yangjie01 님이 작성:
>>
>>> Congratulations!
>>>
>>>
>>>
>>> Jie Yang
>>>
>>>
>>>
>>> *发件人**: *Dongjoon Hyun 
>>> *日期**: *2023年10月4日 星期三 13:04
>>> *收件人**: *Hyukjin Kwon 
>>> *抄送**: *Hussein Awala , Rui Wang ,
>>> Gengliang Wang , Xiao Li , "
>>> dev@spark.apache.org" 
>>> *主题**: *Re: Welcome to Our New Apache Spark Committer and PMCs
>>>
>>>
>>>
>>> Congratulations!
>>>
>>>
>>>
>>> Dongjoon.
>>>
>>>
>>>
>>> On Tue, Oct 3, 2023 at 5:25 PM Hyukjin Kwon 
>>> wrote:
>>>
>>> Woohoo!
>>>
>>>
>>>
>>> On Tue, 3 Oct 2023 at 22:47, Hussein Awala  wrote:
>>>
>>> Congrats to all of you!
>>>
>>>
>>>
>>> On Tue 3 Oct 2023 at 08:15, Rui Wang  wrote:
>>>
>>> Congratulations! Well deserved!
>>>
>>>
>>>
>>> -Rui
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Oct 2, 2023 at 10:32 PM Gengliang Wang  wrote:
>>>
>>> Congratulations to all! Well deserved!
>>>
>>>
>>>
>>> On Mon, Oct 2, 2023 at 10:16 PM Xiao Li  wrote:
>>>
>>> Hi all,
>>>
>>> The Spark PMC is delighted to announce that we have voted to add one new
>>> committer and two new PMC members. These individuals have consistently
>>> contributed to the project and have clearly demonstrated their expertise.
>>>
>>> New Committer:
>>> - Jiaan Geng (focusing on Spark Connect and Spark SQL)
>>>
>>> New PMCs:
>>> - Yuanjian Li
>>> - Yikun Jiang
>>>
>>> Please join us in extending a warm welcome to them in their new roles!
>>>
>>> Sincerely,
>>> The Spark PMC
>>>
>>>

Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-04 Thread Chao Sun

Congratulations!

On Wed, Oct 4, 2023 at 5:11 AM Jungtaek Lim 
wrote:

> Congrats!
>
> 2023년 10월 4일 (수) 오후 5:04, yangjie01 님이 작성:
>
>> Congratulations!
>>
>>
>>
>> Jie Yang
>>
>>
>>
>> *发件人**: *Dongjoon Hyun 
>> *日期**: *2023年10月4日 星期三 13:04
>> *收件人**: *Hyukjin Kwon 
>> *抄送**: *Hussein Awala , Rui Wang ,
>> Gengliang Wang , Xiao Li , "
>> dev@spark.apache.org" 
>> *主题**: *Re: Welcome to Our New Apache Spark Committer and PMCs
>>
>>
>>
>> Congratulations!
>>
>>
>>
>> Dongjoon.
>>
>>
>>
>> On Tue, Oct 3, 2023 at 5:25 PM Hyukjin Kwon  wrote:
>>
>> Woohoo!
>>
>>
>>
>> On Tue, 3 Oct 2023 at 22:47, Hussein Awala  wrote:
>>
>> Congrats to all of you!
>>
>>
>>
>> On Tue 3 Oct 2023 at 08:15, Rui Wang  wrote:
>>
>> Congratulations! Well deserved!
>>
>>
>>
>> -Rui
>>
>>
>>
>>
>>
>> On Mon, Oct 2, 2023 at 10:32 PM Gengliang Wang  wrote:
>>
>> Congratulations to all! Well deserved!
>>
>>
>>
>> On Mon, Oct 2, 2023 at 10:16 PM Xiao Li  wrote:
>>
>> Hi all,
>>
>> The Spark PMC is delighted to announce that we have voted to add one new
>> committer and two new PMC members. These individuals have consistently
>> contributed to the project and have clearly demonstrated their expertise.
>>
>> New Committer:
>> - Jiaan Geng (focusing on Spark Connect and Spark SQL)
>>
>> New PMCs:
>> - Yuanjian Li
>> - Yikun Jiang
>>
>> Please join us in extending a warm welcome to them in their new roles!
>>
>> Sincerely,
>> The Spark PMC
>>
>>

Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-04 Thread Jungtaek Lim

Congrats!

2023년 10월 4일 (수) 오후 5:04, yangjie01 님이 작성:

> Congratulations!
>
>
>
> Jie Yang
>
>
>
> *发件人**: *Dongjoon Hyun 
> *日期**: *2023年10月4日 星期三 13:04
> *收件人**: *Hyukjin Kwon 
> *抄送**: *Hussein Awala , Rui Wang ,
> Gengliang Wang , Xiao Li , "
> dev@spark.apache.org" 
> *主题**: *Re: Welcome to Our New Apache Spark Committer and PMCs
>
>
>
> Congratulations!
>
>
>
> Dongjoon.
>
>
>
> On Tue, Oct 3, 2023 at 5:25 PM Hyukjin Kwon  wrote:
>
> Woohoo!
>
>
>
> On Tue, 3 Oct 2023 at 22:47, Hussein Awala  wrote:
>
> Congrats to all of you!
>
>
>
> On Tue 3 Oct 2023 at 08:15, Rui Wang  wrote:
>
> Congratulations! Well deserved!
>
>
>
> -Rui
>
>
>
>
>
> On Mon, Oct 2, 2023 at 10:32 PM Gengliang Wang  wrote:
>
> Congratulations to all! Well deserved!
>
>
>
> On Mon, Oct 2, 2023 at 10:16 PM Xiao Li  wrote:
>
> Hi all,
>
> The Spark PMC is delighted to announce that we have voted to add one new
> committer and two new PMC members. These individuals have consistently
> contributed to the project and have clearly demonstrated their expertise.
>
> New Committer:
> - Jiaan Geng (focusing on Spark Connect and Spark SQL)
>
> New PMCs:
> - Yuanjian Li
> - Yikun Jiang
>
> Please join us in extending a warm welcome to them in their new roles!
>
> Sincerely,
> The Spark PMC
>
>

Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-04 Thread yangjie01

Congratulations!

Jie Yang

发件人: Dongjoon Hyun 
日期: 2023年10月4日 星期三 13:04
收件人: Hyukjin Kwon 
抄送: Hussein Awala , Rui Wang , 
Gengliang Wang , Xiao Li , 
"dev@spark.apache.org" 
主题: Re: Welcome to Our New Apache Spark Committer and PMCs

Congratulations!

Dongjoon.

On Tue, Oct 3, 2023 at 5:25 PM Hyukjin Kwon 
mailto:gurwls...@apache.org>> wrote:
Woohoo!

On Tue, 3 Oct 2023 at 22:47, Hussein Awala 
mailto:huss...@awala.fr>> wrote:
Congrats to all of you!

On Tue 3 Oct 2023 at 08:15, Rui Wang 
mailto:amaliu...@apache.org>> wrote:
Congratulations! Well deserved!

-Rui


On Mon, Oct 2, 2023 at 10:32 PM Gengliang Wang 
mailto:ltn...@gmail.com>> wrote:
Congratulations to all! Well deserved!

On Mon, Oct 2, 2023 at 10:16 PM Xiao Li 
mailto:gatorsm...@gmail.com>> wrote:
Hi all,

The Spark PMC is delighted to announce that we have voted to add one new 
committer and two new PMC members. These individuals have consistently 
contributed to the project and have clearly demonstrated their expertise.

New Committer:
- Jiaan Geng (focusing on Spark Connect and Spark SQL)

New PMCs:
- Yuanjian Li
- Yikun Jiang

Please join us in extending a warm welcome to them in their new roles!

Sincerely,
The Spark PMC

Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-03 Thread Dongjoon Hyun

Congratulations!

Dongjoon.

On Tue, Oct 3, 2023 at 5:25 PM Hyukjin Kwon  wrote:

> Woohoo!
>
> On Tue, 3 Oct 2023 at 22:47, Hussein Awala  wrote:
>
>> Congrats to all of you!
>>
>> On Tue 3 Oct 2023 at 08:15, Rui Wang  wrote:
>>
>>> Congratulations! Well deserved!
>>>
>>> -Rui
>>>
>>>
>>> On Mon, Oct 2, 2023 at 10:32 PM Gengliang Wang  wrote:
>>>
 Congratulations to all! Well deserved!

 On Mon, Oct 2, 2023 at 10:16 PM Xiao Li  wrote:

> Hi all,
>
> The Spark PMC is delighted to announce that we have voted to add one
> new committer and two new PMC members. These individuals have consistently
> contributed to the project and have clearly demonstrated their expertise.
>
> New Committer:
> - Jiaan Geng (focusing on Spark Connect and Spark SQL)
>
> New PMCs:
> - Yuanjian Li
> - Yikun Jiang
>
> Please join us in extending a warm welcome to them in their new roles!
>
> Sincerely,
> The Spark PMC
>

Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-03 Thread Wenchen Fan

Congrats!

On Wed, Oct 4, 2023 at 8:25 AM Hyukjin Kwon  wrote:

> Woohoo!
>
> On Tue, 3 Oct 2023 at 22:47, Hussein Awala  wrote:
>
>> Congrats to all of you!
>>
>> On Tue 3 Oct 2023 at 08:15, Rui Wang  wrote:
>>
>>> Congratulations! Well deserved!
>>>
>>> -Rui
>>>
>>>
>>> On Mon, Oct 2, 2023 at 10:32 PM Gengliang Wang  wrote:
>>>
 Congratulations to all! Well deserved!

 On Mon, Oct 2, 2023 at 10:16 PM Xiao Li  wrote:

> Hi all,
>
> The Spark PMC is delighted to announce that we have voted to add one
> new committer and two new PMC members. These individuals have consistently
> contributed to the project and have clearly demonstrated their expertise.
>
> New Committer:
> - Jiaan Geng (focusing on Spark Connect and Spark SQL)
>
> New PMCs:
> - Yuanjian Li
> - Yikun Jiang
>
> Please join us in extending a warm welcome to them in their new roles!
>
> Sincerely,
> The Spark PMC
>

Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-03 Thread Hyukjin Kwon

Woohoo!

On Tue, 3 Oct 2023 at 22:47, Hussein Awala  wrote:

> Congrats to all of you!
>
> On Tue 3 Oct 2023 at 08:15, Rui Wang  wrote:
>
>> Congratulations! Well deserved!
>>
>> -Rui
>>
>>
>> On Mon, Oct 2, 2023 at 10:32 PM Gengliang Wang  wrote:
>>
>>> Congratulations to all! Well deserved!
>>>
>>> On Mon, Oct 2, 2023 at 10:16 PM Xiao Li  wrote:
>>>
 Hi all,

 The Spark PMC is delighted to announce that we have voted to add one
 new committer and two new PMC members. These individuals have consistently
 contributed to the project and have clearly demonstrated their expertise.

 New Committer:
 - Jiaan Geng (focusing on Spark Connect and Spark SQL)

 New PMCs:
 - Yuanjian Li
 - Yikun Jiang

 Please join us in extending a warm welcome to them in their new roles!

 Sincerely,
 The Spark PMC

>>>

Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-03 Thread Mridul Muralidharan

  Congratulations !
Looking forward to more exciting contributions :-)

Regards,
Mridul

On Tue, Oct 3, 2023 at 2:51 AM Hussein Awala  wrote:

> Congrats to all of you!
>
> On Tue 3 Oct 2023 at 08:15, Rui Wang  wrote:
>
>> Congratulations! Well deserved!
>>
>> -Rui
>>
>>
>> On Mon, Oct 2, 2023 at 10:32 PM Gengliang Wang  wrote:
>>
>>> Congratulations to all! Well deserved!
>>>
>>> On Mon, Oct 2, 2023 at 10:16 PM Xiao Li  wrote:
>>>
 Hi all,

 The Spark PMC is delighted to announce that we have voted to add one
 new committer and two new PMC members. These individuals have consistently
 contributed to the project and have clearly demonstrated their expertise.

 New Committer:
 - Jiaan Geng (focusing on Spark Connect and Spark SQL)

 New PMCs:
 - Yuanjian Li
 - Yikun Jiang

 Please join us in extending a warm welcome to them in their new roles!

 Sincerely,
 The Spark PMC

>>>

Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-03 Thread Hussein Awala

Congrats to all of you!

On Tue 3 Oct 2023 at 08:15, Rui Wang  wrote:

> Congratulations! Well deserved!
>
> -Rui
>
>
> On Mon, Oct 2, 2023 at 10:32 PM Gengliang Wang  wrote:
>
>> Congratulations to all! Well deserved!
>>
>> On Mon, Oct 2, 2023 at 10:16 PM Xiao Li  wrote:
>>
>>> Hi all,
>>>
>>> The Spark PMC is delighted to announce that we have voted to add one new
>>> committer and two new PMC members. These individuals have consistently
>>> contributed to the project and have clearly demonstrated their expertise.
>>>
>>> New Committer:
>>> - Jiaan Geng (focusing on Spark Connect and Spark SQL)
>>>
>>> New PMCs:
>>> - Yuanjian Li
>>> - Yikun Jiang
>>>
>>> Please join us in extending a warm welcome to them in their new roles!
>>>
>>> Sincerely,
>>> The Spark PMC
>>>
>>

Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-03 Thread Rui Wang

Congratulations! Well deserved!

-Rui


On Mon, Oct 2, 2023 at 10:32 PM Gengliang Wang  wrote:

> Congratulations to all! Well deserved!
>
> On Mon, Oct 2, 2023 at 10:16 PM Xiao Li  wrote:
>
>> Hi all,
>>
>> The Spark PMC is delighted to announce that we have voted to add one new
>> committer and two new PMC members. These individuals have consistently
>> contributed to the project and have clearly demonstrated their expertise.
>>
>> New Committer:
>> - Jiaan Geng (focusing on Spark Connect and Spark SQL)
>>
>> New PMCs:
>> - Yuanjian Li
>> - Yikun Jiang
>>
>> Please join us in extending a warm welcome to them in their new roles!
>>
>> Sincerely,
>> The Spark PMC
>>
>

Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-02 Thread Gengliang Wang

Congratulations to all! Well deserved!

On Mon, Oct 2, 2023 at 10:16 PM Xiao Li  wrote:

> Hi all,
>
> The Spark PMC is delighted to announce that we have voted to add one new
> committer and two new PMC members. These individuals have consistently
> contributed to the project and have clearly demonstrated their expertise.
>
> New Committer:
> - Jiaan Geng (focusing on Spark Connect and Spark SQL)
>
> New PMCs:
> - Yuanjian Li
> - Yikun Jiang
>
> Please join us in extending a warm welcome to them in their new roles!
>
> Sincerely,
> The Spark PMC
>

Welcome to Our New Apache Spark Committer and PMCs

2023-10-02 Thread Xiao Li

Hi all,

The Spark PMC is delighted to announce that we have voted to add one new
committer and two new PMC members. These individuals have consistently
contributed to the project and have clearly demonstrated their expertise.

New Committer:
- Jiaan Geng (focusing on Spark Connect and Spark SQL)

New PMCs:
- Yuanjian Li
- Yikun Jiang

Please join us in extending a warm welcome to them in their new roles!

Sincerely,
The Spark PMC

[RESULT] Updating documentation hosted for EOL and maintenance releases

2023-09-29 Thread Hyukjin Kwon

The vote passes with 9 +1s (6 binding +1s).

(* = binding)
+1:
- Hyukjin Kwon *
- Ruifeng Zheng *
- Jiaan Geng
- Yikun Jiang *
- Herman van Hovell *
- Michel Miotto Barbosa
- Maciej Szymkiewicz *
- Denny Lee
- Yuanjian Li *

unsubscribe

2023-09-26 Thread praveen rao joginapally

Re: [ANNOUNCE] Apache Spark 3.5.0 released

2023-09-26 Thread Hyukjin Kwon

Awesome!

On Wed, 27 Sept 2023 at 11:02, Hussein Awala  wrote:

> I installed the package, tested it with kubernetes master from Jupyter,
> and tested it with Spark Connect server, all looks good.
>
> On Tue, Sep 26, 2023 at 10:45 PM Yuanjian Li 
> wrote:
>
>> FYI, we received the handling from Pypi
>>  org yesterday, and the
>> upload of version 3.5.0 has just been completed. Please assist in verifying
>> it. Thank you!
>>
>> Ruifeng Zheng  于2023年9月17日周日 23:28写道：
>>
>>> Thanks Yuanjian for driving this release, Congratulations!
>>>
>>> On Mon, Sep 18, 2023 at 2:16 PM Maxim Gekk
>>>  wrote:
>>>
 Thank you for the work, Yuanjian!

 On Mon, Sep 18, 2023 at 6:28 AM beliefer  wrote:

> Congratulations! Apache Spark.
>
>
>
> At 2023-09-16 01:01:40, "Yuanjian Li"  wrote:
>
> Hi All,
>
> We are happy to announce the availability of *Apache Spark 3.5.0*!
>
> Apache Spark 3.5.0 is the sixth release of the 3.x line.
>
> To download Spark 3.5.0, head over to the download page:
> https://spark.apache.org/downloads.html
> (Please note: the PyPi upload is pending due to a size limit request;
> we're actively following up here
>  with the PyPi
> organization)
>
> To view the release notes:
> https://spark.apache.org/releases/spark-release-3-5-0.html
>
> We would like to acknowledge all community members for contributing to
> this
> release. This release would not have been possible without you.
>
> Best,
> Yuanjian
>
>
>>>
>>> --
>>> Ruifeng Zheng
>>> E-mail: zrfli...@gmail.com
>>>
>>

Re: Migrating the Junit framework used in Apache Spark 4.0 from 4.x to 5.x

2023-09-26 Thread Mridul Muralidharan

+1 for moving to a newer version.
Thanks for driving this Jie Yang !

Regards,
Mridul


On Mon, Sep 25, 2023 at 10:15 AM 杨杰  wrote:

> Hi all,
>
> In SPARK-44170 (apache/spark#43074 [1]), I’m trying to migrate the Junit
> test framework used in Spark 4.0 from Junit4 to Junit5.
>
>
> Although this involves a fair amount of code modifications, given that
> Junit 4 is still developed based on Java 6 source code and it hasn't
> released a new version for over two years (the Junit 4.13.2 that Spark is
> currently using was released on February 14, 2021.), I personally believe
> it's worth it.
>
> Feel free to comment if you have any concerns.
>
> [1] https://github.com/apache/spark/pull/43074
>
> Thanks,
> Jie Yang
>

Re: [ANNOUNCE] Apache Spark 3.5.0 released

2023-09-26 Thread Hussein Awala

I installed the package, tested it with kubernetes master from Jupyter, and
tested it with Spark Connect server, all looks good.

On Tue, Sep 26, 2023 at 10:45 PM Yuanjian Li  wrote:

> FYI, we received the handling from Pypi
>  org yesterday, and the
> upload of version 3.5.0 has just been completed. Please assist in verifying
> it. Thank you!
>
> Ruifeng Zheng  于2023年9月17日周日 23:28写道：
>
>> Thanks Yuanjian for driving this release, Congratulations!
>>
>> On Mon, Sep 18, 2023 at 2:16 PM Maxim Gekk
>>  wrote:
>>
>>> Thank you for the work, Yuanjian!
>>>
>>> On Mon, Sep 18, 2023 at 6:28 AM beliefer  wrote:
>>>
 Congratulations! Apache Spark.

 At 2023-09-16 01:01:40, "Yuanjian Li"  wrote:

 Hi All,

 We are happy to announce the availability of *Apache Spark 3.5.0*!

 Apache Spark 3.5.0 is the sixth release of the 3.x line.

 To download Spark 3.5.0, head over to the download page:
 https://spark.apache.org/downloads.html
 (Please note: the PyPi upload is pending due to a size limit request;
 we're actively following up here
  with the PyPi
 organization)

 To view the release notes:
 https://spark.apache.org/releases/spark-release-3-5-0.html

 We would like to acknowledge all community members for contributing to
 this
 release. This release would not have been possible without you.

 Best,
 Yuanjian

>>
>> --
>> Ruifeng Zheng
>> E-mail: zrfli...@gmail.com
>>
>

Re: [ANNOUNCE] Apache Spark 3.5.0 released

2023-09-26 Thread Yuanjian Li

FYI, we received the handling from Pypi
 org yesterday, and the upload
of version 3.5.0 has just been completed. Please assist in verifying it.
Thank you!

Ruifeng Zheng  于2023年9月17日周日 23:28写道：

> Thanks Yuanjian for driving this release, Congratulations!
>
> On Mon, Sep 18, 2023 at 2:16 PM Maxim Gekk
>  wrote:
>
>> Thank you for the work, Yuanjian!
>>
>> On Mon, Sep 18, 2023 at 6:28 AM beliefer  wrote:
>>
>>> Congratulations! Apache Spark.
>>>
>>>
>>>
>>> At 2023-09-16 01:01:40, "Yuanjian Li"  wrote:
>>>
>>> Hi All,
>>>
>>> We are happy to announce the availability of *Apache Spark 3.5.0*!
>>>
>>> Apache Spark 3.5.0 is the sixth release of the 3.x line.
>>>
>>> To download Spark 3.5.0, head over to the download page:
>>> https://spark.apache.org/downloads.html
>>> (Please note: the PyPi upload is pending due to a size limit request;
>>> we're actively following up here
>>>  with the PyPi
>>> organization)
>>>
>>> To view the release notes:
>>> https://spark.apache.org/releases/spark-release-3-5-0.html
>>>
>>> We would like to acknowledge all community members for contributing to
>>> this
>>> release. This release would not have been possible without you.
>>>
>>> Best,
>>> Yuanjian
>>>
>>>
>
> --
> Ruifeng Zheng
> E-mail: zrfli...@gmail.com
>

Re: [VOTE] Updating documentation hosted for EOL and maintenance releases

2023-09-26 Thread Yuanjian Li

+1

Denny Lee  于2023年9月26日周二 12:07写道：

> +1
>
> On Tue, Sep 26, 2023 at 10:52 Maciej  wrote:
>
>> +1
>>
>> Best regards,
>> Maciej Szymkiewicz
>>
>> Web: https://zero323.net
>> PGP: A30CEF0C31A501EC
>>
>> On 9/26/23 17:12, Michel Miotto Barbosa wrote:
>>
>> +1
>>
>> A disposição | At your disposal
>>
>> Michel Miotto Barbosa
>> https://www.linkedin.com/in/michelmiottobarbosa/
>> mmiottobarb...@gmail.com
>> +55 11 984 342 347
>>
>>
>>
>>
>> On Tue, Sep 26, 2023 at 11:44 AM Herman van Hovell
>>   wrote:
>>
>>> +1
>>>
>>> On Tue, Sep 26, 2023 at 10:39 AM yangjie01 
>>>  wrote:
>>>
 +1



 *发件人**: *Yikun Jiang 
 *日期**: *2023年9月26日 星期二 18:06
 *收件人**: *dev 
 *抄送**: *Hyukjin Kwon , Ruifeng Zheng <
 ruife...@apache.org>
 *主题**: *Re: [VOTE] Updating documentation hosted for EOL and
 maintenance releases



 +1, I believe it is a wise choice to update the EOL policy of the
 document based on the real demands of community users.


 Regards,

 Yikun





 On Tue, Sep 26, 2023 at 1:06 PM Ruifeng Zheng 
 wrote:

 +1



 On Tue, Sep 26, 2023 at 12:51 PM Hyukjin Kwon 
 wrote:

 Hi all,

 I would like to start the vote for updating documentation hosted for
 EOL and maintenance releases to improve the usability here, and in order
 for end users to read the proper and correct documentation.


 For discussion thread, please refer to
 https://lists.apache.org/thread/1675rzxx5x4j2x03t9x0kfph8tlys0cx
 .




 Here is one example:
 - https://github.com/apache/spark/pull/42989
 

 - https://github.com/apache/spark-website/pull/480
 



 Starting with my own +1.

Re: [VOTE] Updating documentation hosted for EOL and maintenance releases

2023-09-26 Thread Denny Lee

+1

On Tue, Sep 26, 2023 at 10:52 Maciej  wrote:

> +1
>
> Best regards,
> Maciej Szymkiewicz
>
> Web: https://zero323.net
> PGP: A30CEF0C31A501EC
>
> On 9/26/23 17:12, Michel Miotto Barbosa wrote:
>
> +1
>
> A disposição | At your disposal
>
> Michel Miotto Barbosa
> https://www.linkedin.com/in/michelmiottobarbosa/
> mmiottobarb...@gmail.com
> +55 11 984 342 347
>
>
>
>
> On Tue, Sep 26, 2023 at 11:44 AM Herman van Hovell
>   wrote:
>
>> +1
>>
>> On Tue, Sep 26, 2023 at 10:39 AM yangjie01 
>>  wrote:
>>
>>> +1
>>>
>>>
>>>
>>> *发件人**: *Yikun Jiang 
>>> *日期**: *2023年9月26日 星期二 18:06
>>> *收件人**: *dev 
>>> *抄送**: *Hyukjin Kwon , Ruifeng Zheng <
>>> ruife...@apache.org>
>>> *主题**: *Re: [VOTE] Updating documentation hosted for EOL and
>>> maintenance releases
>>>
>>>
>>>
>>> +1, I believe it is a wise choice to update the EOL policy of the
>>> document based on the real demands of community users.
>>>
>>>
>>> Regards,
>>>
>>> Yikun
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Sep 26, 2023 at 1:06 PM Ruifeng Zheng 
>>> wrote:
>>>
>>> +1
>>>
>>>
>>>
>>> On Tue, Sep 26, 2023 at 12:51 PM Hyukjin Kwon 
>>> wrote:
>>>
>>> Hi all,
>>>
>>> I would like to start the vote for updating documentation hosted for EOL
>>> and maintenance releases to improve the usability here, and in order for
>>> end users to read the proper and correct documentation.
>>>
>>>
>>> For discussion thread, please refer to
>>> https://lists.apache.org/thread/1675rzxx5x4j2x03t9x0kfph8tlys0cx
>>> .
>>>
>>>
>>>
>>>
>>> Here is one example:
>>> - https://github.com/apache/spark/pull/42989
>>> 
>>>
>>> - https://github.com/apache/spark-website/pull/480
>>> 
>>>
>>>
>>>
>>> Starting with my own +1.
>>>
>>>

Re: [VOTE] Updating documentation hosted for EOL and maintenance releases

2023-09-26 Thread Maciej


+1

Best regards,
Maciej Szymkiewicz

Web:https://zero323.net
PGP: A30CEF0C31A501EC

On 9/26/23 17:12, Michel Miotto Barbosa wrote:

+1

A disposição | At your disposal

Michel Miotto Barbosa
https://www.linkedin.com/in/michelmiottobarbosa/
mmiottobarb...@gmail.com
+55 11 984 342 347




On Tue, Sep 26, 2023 at 11:44 AM Herman van Hovell 
 wrote:


+1

On Tue, Sep 26, 2023 at 10:39 AM yangjie01
 wrote:

+1

*发件人**: *Yikun Jiang 
*日期**: *2023年9月26日星期二18:06
*收件人**: *dev 
*抄送**: *Hyukjin Kwon , Ruifeng Zheng

*主题**: *Re: [VOTE] Updating documentation hosted for EOL and
maintenance releases

+1, I believe it is a wise choice to update the EOL policy of
the document based on the real demands of community users.


Regards,

Yikun

On Tue, Sep 26, 2023 at 1:06 PM Ruifeng Zheng
 wrote:

+1

On Tue, Sep 26, 2023 at 12:51 PM Hyukjin Kwon
 wrote:

Hi all,

I would like to start the vote for updating
documentation hosted for EOL and maintenance releases
to improve the usability here, and in order for end
users to read the proper and correct documentation.


For discussion thread, please refer to
https://lists.apache.org/thread/1675rzxx5x4j2x03t9x0kfph8tlys0cx

.


Here is one example:
- https://github.com/apache/spark/pull/42989



- https://github.com/apache/spark-website/pull/480



Starting with my own +1.



OpenPGP_signature.asc
Description: OpenPGP digital signature

Re: [VOTE] Updating documentation hosted for EOL and maintenance releases

2023-09-26 Thread Michel Miotto Barbosa

+1

A disposição | At your disposal

Michel Miotto Barbosa
https://www.linkedin.com/in/michelmiottobarbosa/
mmiottobarb...@gmail.com
+55 11 984 342 347




On Tue, Sep 26, 2023 at 11:44 AM Herman van Hovell
 wrote:

> +1
>
> On Tue, Sep 26, 2023 at 10:39 AM yangjie01 
> wrote:
>
>> +1
>>
>>
>>
>> *发件人**: *Yikun Jiang 
>> *日期**: *2023年9月26日 星期二 18:06
>> *收件人**: *dev 
>> *抄送**: *Hyukjin Kwon , Ruifeng Zheng <
>> ruife...@apache.org>
>> *主题**: *Re: [VOTE] Updating documentation hosted for EOL and maintenance
>> releases
>>
>>
>>
>> +1, I believe it is a wise choice to update the EOL policy of the
>> document based on the real demands of community users.
>>
>>
>> Regards,
>>
>> Yikun
>>
>>
>>
>>
>>
>> On Tue, Sep 26, 2023 at 1:06 PM Ruifeng Zheng 
>> wrote:
>>
>> +1
>>
>>
>>
>> On Tue, Sep 26, 2023 at 12:51 PM Hyukjin Kwon 
>> wrote:
>>
>> Hi all,
>>
>> I would like to start the vote for updating documentation hosted for EOL
>> and maintenance releases to improve the usability here, and in order for
>> end users to read the proper and correct documentation.
>>
>>
>> For discussion thread, please refer to
>> https://lists.apache.org/thread/1675rzxx5x4j2x03t9x0kfph8tlys0cx
>> .
>>
>>
>>
>>
>> Here is one example:
>> - https://github.com/apache/spark/pull/42989
>> 
>>
>> - https://github.com/apache/spark-website/pull/480
>> 
>>
>>
>>
>> Starting with my own +1.
>>
>>

Re: [VOTE] Updating documentation hosted for EOL and maintenance releases

2023-09-26 Thread Herman van Hovell

+1

On Tue, Sep 26, 2023 at 10:39 AM yangjie01 
wrote:

> +1
>
>
>
> *发件人**: *Yikun Jiang 
> *日期**: *2023年9月26日 星期二 18:06
> *收件人**: *dev 
> *抄送**: *Hyukjin Kwon , Ruifeng Zheng <
> ruife...@apache.org>
> *主题**: *Re: [VOTE] Updating documentation hosted for EOL and maintenance
> releases
>
>
>
> +1, I believe it is a wise choice to update the EOL policy of the document
> based on the real demands of community users.
>
>
> Regards,
>
> Yikun
>
>
>
>
>
> On Tue, Sep 26, 2023 at 1:06 PM Ruifeng Zheng  wrote:
>
> +1
>
>
>
> On Tue, Sep 26, 2023 at 12:51 PM Hyukjin Kwon 
> wrote:
>
> Hi all,
>
> I would like to start the vote for updating documentation hosted for EOL
> and maintenance releases to improve the usability here, and in order for
> end users to read the proper and correct documentation.
>
>
> For discussion thread, please refer to
> https://lists.apache.org/thread/1675rzxx5x4j2x03t9x0kfph8tlys0cx
> .
>
>
>
>
> Here is one example:
> - https://github.com/apache/spark/pull/42989
> 
>
> - https://github.com/apache/spark-website/pull/480
> 
>
>
>
> Starting with my own +1.
>
>

Re: [VOTE] Updating documentation hosted for EOL and maintenance releases

2023-09-26 Thread yangjie01

+1

发件人: Yikun Jiang 
日期: 2023年9月26日 星期二 18:06
收件人: dev 
抄送: Hyukjin Kwon , Ruifeng Zheng 
主题: Re: [VOTE] Updating documentation hosted for EOL and maintenance releases

+1, I believe it is a wise choice to update the EOL policy of the document 
based on the real demands of community users.

Regards,
Yikun


On Tue, Sep 26, 2023 at 1:06 PM Ruifeng Zheng 
mailto:ruife...@apache.org>> wrote:
+1

On Tue, Sep 26, 2023 at 12:51 PM Hyukjin Kwon 
mailto:gurwls...@apache.org>> wrote:
Hi all,

I would like to start the vote for updating documentation hosted for EOL and 
maintenance releases to improve the usability here, and in order for end users 
to read the proper and correct documentation.

For discussion thread, please refer to 
https://lists.apache.org/thread/1675rzxx5x4j2x03t9x0kfph8tlys0cx.

Here is one example:
- 
https://github.com/apache/spark/pull/42989
- 
https://github.com/apache/spark-website/pull/480

Starting with my own +1.

Re: [VOTE] Updating documentation hosted for EOL and maintenance releases

2023-09-26 Thread Yikun Jiang

+1, I believe it is a wise choice to update the EOL policy of the document
based on the real demands of community users.

Regards,
Yikun


On Tue, Sep 26, 2023 at 1:06 PM Ruifeng Zheng  wrote:

> +1
>
> On Tue, Sep 26, 2023 at 12:51 PM Hyukjin Kwon 
> wrote:
>
>> Hi all,
>>
>> I would like to start the vote for updating documentation hosted for EOL
>> and maintenance releases to improve the usability here, and in order for
>> end users to read the proper and correct documentation.
>>
>> For discussion thread, please refer to
>> https://lists.apache.org/thread/1675rzxx5x4j2x03t9x0kfph8tlys0cx.
>>
>> Here is one example:
>> - https://github.com/apache/spark/pull/42989
>> - https://github.com/apache/spark-website/pull/480
>>
>> Starting with my own +1.
>>
>

Re:Re: [VOTE] Updating documentation hosted for EOL and maintenance releases

2023-09-26 Thread beliefer

+1







At 2023-09-26 13:03:56, "Ruifeng Zheng"  wrote:

+1



On Tue, Sep 26, 2023 at 12:51 PM Hyukjin Kwon  wrote:

Hi all,

I would like to start the vote for updating documentation hosted for EOL and 
maintenance releases to improve the usability here, and in order for end users 
to read the proper and correct documentation.

For discussion thread, please refer to 
https://lists.apache.org/thread/1675rzxx5x4j2x03t9x0kfph8tlys0cx.


Here is one example:
- https://github.com/apache/spark/pull/42989
- https://github.com/apache/spark-website/pull/480


Starting with my own +1.

Re: [VOTE] Updating documentation hosted for EOL and maintenance releases

2023-09-25 Thread Ruifeng Zheng

+1

On Tue, Sep 26, 2023 at 12:51 PM Hyukjin Kwon  wrote:

> Hi all,
>
> I would like to start the vote for updating documentation hosted for EOL
> and maintenance releases to improve the usability here, and in order for
> end users to read the proper and correct documentation.
>
> For discussion thread, please refer to
> https://lists.apache.org/thread/1675rzxx5x4j2x03t9x0kfph8tlys0cx.
>
> Here is one example:
> - https://github.com/apache/spark/pull/42989
> - https://github.com/apache/spark-website/pull/480
>
> Starting with my own +1.
>

[VOTE] Updating documentation hosted for EOL and maintenance releases

2023-09-25 Thread Hyukjin Kwon

Hi all,

I would like to start the vote for updating documentation hosted for EOL
and maintenance releases to improve the usability here, and in order for
end users to read the proper and correct documentation.

For discussion thread, please refer to
https://lists.apache.org/thread/1675rzxx5x4j2x03t9x0kfph8tlys0cx.

Here is one example:
- https://github.com/apache/spark/pull/42989
- https://github.com/apache/spark-website/pull/480

Starting with my own +1.

Migrating the Junit framework used in Apache Spark 4.0 from 4.x to 5.x

2023-09-25 Thread 杨杰

Hi all,

In SPARK-44170 (apache/spark#43074 [1]), I’m trying to migrate the Junit
test framework used in Spark 4.0 from Junit4 to Junit5.


Although this involves a fair amount of code modifications, given that
Junit 4 is still developed based on Java 6 source code and it hasn't
released a new version for over two years (the Junit 4.13.2 that Spark is
currently using was released on February 14, 2021.), I personally believe
it's worth it.

Feel free to comment if you have any concerns.

[1] https://github.com/apache/spark/pull/43074

Thanks,
Jie Yang

unsubscribe

2023-09-24 Thread Wei Hong

unsubscribe

Re: Are DataFrame rows ordered without an explicit ordering clause?

2023-09-24 Thread Mich Talebzadeh

LOL,

Hindsight is a very good thing and often one learns these through
experience.Once told off because strict ordering was not maintained, then
the lesson will never be forgotten!

HTH


Mich Talebzadeh,
Distinguished Technologist, Solutions Architect & Engineer
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Sat, 23 Sept 2023 at 13:29, Steve Loughran 
wrote:

>
> Now, if you are ruthless it'd make sense to randomise the order of results
> if someone left out the order by, to stop complacency.
>
> like that time sun changed the ordering that methods were returned in a
> Class.listMethods() call and everyone's junit test cases failed if they'd
> assumed that ordering was that of the source file -which it was until then,
> even though the language spec said "no guarantees".
>
> People code for what works, not what is documented in places they don't
> read. (this is also why anyone writing network code should really have a
> flaky network connection to keep themselves honest)
>
> On Sat, 23 Sept 2023 at 11:00, beliefer  wrote:
>
>> AFAIK, The order is free whether it's SQL without spcified ORDER BY
>> clause or  DataFrame without sort. The behavior is consistent between them.
>>
>>
>>
>> At 2023-09-18 23:47:40, "Nicholas Chammas" 
>> wrote:
>>
>> I’ve always considered DataFrames to be logically equivalent to SQL
>> tables or queries.
>>
>> In SQL, the result order of any query is implementation-dependent without
>> an explicit ORDER BY clause. Technically, you could run `SELECT * FROM
>> table;` 10 times in a row and get 10 different orderings.
>>
>> I thought the same applied to DataFrames, but the docstring for the
>> recently added method DataFrame.offset
>> 
>>  implies
>> otherwise.
>>
>> This example will work fine in practice, of course. But if DataFrames are
>> technically unordered without an explicit ordering clause, then in theory a
>> future implementation change may result in “Bob" being the “first” row in
>> the DataFrame, rather than “Tom”. That would make the example incorrect.
>>
>> Is that not the case?
>>
>> Nick
>>
>>

Re: Are DataFrame rows ordered without an explicit ordering clause?

2023-09-23 Thread Steve Loughran

Now, if you are ruthless it'd make sense to randomise the order of results
if someone left out the order by, to stop complacency.

like that time sun changed the ordering that methods were returned in a
Class.listMethods() call and everyone's junit test cases failed if they'd
assumed that ordering was that of the source file -which it was until then,
even though the language spec said "no guarantees".

People code for what works, not what is documented in places they don't
read. (this is also why anyone writing network code should really have a
flaky network connection to keep themselves honest)

On Sat, 23 Sept 2023 at 11:00, beliefer  wrote:

> AFAIK, The order is free whether it's SQL without spcified ORDER BY clause
> or  DataFrame without sort. The behavior is consistent between them.
>
>
>
> At 2023-09-18 23:47:40, "Nicholas Chammas" 
> wrote:
>
> I’ve always considered DataFrames to be logically equivalent to SQL tables
> or queries.
>
> In SQL, the result order of any query is implementation-dependent without
> an explicit ORDER BY clause. Technically, you could run `SELECT * FROM
> table;` 10 times in a row and get 10 different orderings.
>
> I thought the same applied to DataFrames, but the docstring for the
> recently added method DataFrame.offset
> 
>  implies
> otherwise.
>
> This example will work fine in practice, of course. But if DataFrames are
> technically unordered without an explicit ordering clause, then in theory a
> future implementation change may result in “Bob" being the “first” row in
> the DataFrame, rather than “Tom”. That would make the example incorrect.
>
> Is that not the case?
>
> Nick
>
>

Re:Are DataFrame rows ordered without an explicit ordering clause?

2023-09-23 Thread beliefer

AFAIK, The order is free whether it's SQL without spcified ORDER BY clause or  
DataFrame without sort. The behavior is consistent between them.







At 2023-09-18 23:47:40, "Nicholas Chammas"  wrote:

I’ve always considered DataFrames to be logically equivalent to SQL tables or 
queries.


In SQL, the result order of any query is implementation-dependent without an 
explicit ORDER BY clause. Technically, you could run `SELECT * FROM table;` 10 
times in a row and get 10 different orderings.


I thought the same applied to DataFrames, but the docstring for the recently 
added method DataFrame.offset implies otherwise.


This example will work fine in practice, of course. But if DataFrames are 
technically unordered without an explicit ordering clause, then in theory a 
future implementation change may result in “Bob" being the “first” row in the 
DataFrame, rather than “Tom”. That would make the example incorrect.


Is that not the case?


Nick

[DISCUSS] Porting back SPARK-45178 to 3.5/3.4 version lines

2023-09-20 Thread Jungtaek Lim

Hi devs,

I'd like to get some inputs for dealing with the possible correctness issue
we figured. The JIRA ticket is SPARK-45178
 and I described the
issue and solution I proposed.

Context:
Source might behave incorrectly leading to correctness issues if it does
not support Trigger.AvailableNow and users set the trigger to
Trigger.AvailableNow. This is due to the incompatibility between fallback
implementation of Trigger.AvailableNow and the source implementation. As a
solution, we want to fall back to single back execution instead for such
cases.

The proposal is approved and merged in master branch (I guess there is no
issue as it's a major release), but since this introduces a behavioral
change, I'd like to hear voices on whether we want to introduce a
behavioral change in bugfix versions to address possible correctness, or
leave these version lines as they are.

Looking for voices on this.

Thanks in advance!
Jungtaek Lim (HeartSaVioR)

Re: Plans for built-in v2 data sources in Spark 4

2023-09-20 Thread Dongjoon Hyun

Instead of that, I believe you are looking for
`spark.sql.sources.useV1SourceList` if the question is about "Concretely,
is the plan for Spark 4 to continue defaulting to the built-in v1 data
sources?".

Here is the code.

https://github.com/apache/spark/blob/324a07b534ac8c2e83a50ac5ea4c5d93fd57b790/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L3148-L3155

Dongjoon.



On Wed, Sep 20, 2023 at 5:47 AM Will Raschkowski 
wrote:

> Thank you for linking that, Dongjoon!
>
>
>
> I found SPARK-44518  in
> that list which wants to turn Spark’s Hive integration into a data source.
> IIUC, that’s very related but I’m curious if I’m thinking about this
> correctly:
>
>
>
> Big gaps between built-in v1 and v2 data sources are support for bucketing
> and partitioning. And the reason v1 data sources support those is because
> the v1 paths are kind of interleaved with Spark’s Hive integration. I
> understand separating that Hive integration or making it more data
> source-ish would put us closer to supporting bucketing and partitioning in
> v2 and then defaulting to v2.
>
>
>
> *From: *Dongjoon Hyun 
> *Date: *Friday, 15 September 2023 at 05:36
> *To: *Will Raschkowski 
> *Cc: *dev@spark.apache.org 
> *Subject: *Re: Plans for built-in v2 data sources in Spark 4
>
> *CAUTION:* This email originates from an external party (outside of
> Palantir). If you believe this message is suspicious in nature, please use
> the "Report Message" button built into Outlook.
>
>
>
> Hi, Will.
>
> According to the following JIRA, as of now, there is no plan or on-going
> discussion to switch it.
>
> https://issues.apache.org/jira/browse/SPARK-44111 [issues.apache.org]
> 
> (Prepare Apache Spark 4.0.0)
>
> Thanks,
> Dongjoon.
>
>
>
>
>
> On Wed, Sep 13, 2023 at 9:02 AM Will Raschkowski
>  wrote:
>
> Hey everyone,
>
>
>
> I was wondering what the plans are for Spark's built-in v2 file data
> sources in Spark 4.
>
>
>
> Concretely, is the plan for Spark 4 to continue defaulting to the built-in
> v1 data sources? And if yes, what are the blockers for defaulting to v2? I
> see, just as example, that writing Hive-partitions is not supported in v2.
> Are there other blockers or outstanding discussions?
>
>
>
> Regards,
>
> Will
>
>
>
>

Re: Plans for built-in v2 data sources in Spark 4

2023-09-20 Thread Will Raschkowski

Thank you for linking that, Dongjoon!

I found SPARK-44518 in that 
list which wants to turn Spark’s Hive integration into a data source. To think 
out loud: The big gaps between built-in v1 and v2 data sources are support for 
bucketing and partitioning. And the reason v1 data sources support those is 
because they’re kind of interleaved with Spark’s Hive integration. Separating 
that Hive integration or making it more data source-ish would put us close to 
supporting bucketing and partitioning in v2 and then defaulting to v2. (Just my 
understanding – curious if I’m thinking about this correctly).

Anyway, thank you for the pointer.

From: Dongjoon Hyun 
Date: Friday, 15 September 2023 at 05:36
To: Will Raschkowski 
Cc: dev@spark.apache.org 
Subject: Re: Plans for built-in v2 data sources in Spark 4
CAUTION: This email originates from an external party (outside of Palantir). If 
you believe this message is suspicious in nature, please use the "Report 
Message" button built into Outlook.

Hi, Will.

According to the following JIRA, as of now, there is no plan or on-going 
discussion to switch it.

https://issues.apache.org/jira/browse/SPARK-44111 
[issues.apache.org]
 (Prepare Apache Spark 4.0.0)

Thanks,
Dongjoon.


On Wed, Sep 13, 2023 at 9:02 AM Will Raschkowski 
 wrote:
Hey everyone,

I was wondering what the plans are for Spark's built-in v2 file data sources in 
Spark 4.

Concretely, is the plan for Spark 4 to continue defaulting to the built-in v1 
data sources? And if yes, what are the blockers for defaulting to v2? I see, 
just as example, that writing Hive-partitions is not supported in v2. Are there 
other blockers or outstanding discussions?

Regards,
Will

Re: Plans for built-in v2 data sources in Spark 4

2023-09-20 Thread Will Raschkowski

Thank you for linking that, Dongjoon!

I found SPARK-44518 in that 
list which wants to turn Spark’s Hive integration into a data source. IIUC, 
that’s very related but I’m curious if I’m thinking about this correctly:

Big gaps between built-in v1 and v2 data sources are support for bucketing and 
partitioning. And the reason v1 data sources support those is because the v1 
paths are kind of interleaved with Spark’s Hive integration. I understand 
separating that Hive integration or making it more data source-ish would put us 
closer to supporting bucketing and partitioning in v2 and then defaulting to v2.

From: Dongjoon Hyun 
Date: Friday, 15 September 2023 at 05:36
To: Will Raschkowski 
Cc: dev@spark.apache.org 
Subject: Re: Plans for built-in v2 data sources in Spark 4
CAUTION: This email originates from an external party (outside of Palantir). If 
you believe this message is suspicious in nature, please use the "Report 
Message" button built into Outlook.

Hi, Will.

According to the following JIRA, as of now, there is no plan or on-going 
discussion to switch it.

https://issues.apache.org/jira/browse/SPARK-44111 
[issues.apache.org]
 (Prepare Apache Spark 4.0.0)

Thanks,
Dongjoon.


On Wed, Sep 13, 2023 at 9:02 AM Will Raschkowski 
 wrote:
Hey everyone,

I was wondering what the plans are for Spark's built-in v2 file data sources in 
Spark 4.

Concretely, is the plan for Spark 4 to continue defaulting to the built-in v1 
data sources? And if yes, what are the blockers for defaulting to v2? I see, 
just as example, that writing Hive-partitions is not supported in v2. Are there 
other blockers or outstanding discussions?

Regards,
Will

Re: Are DataFrame rows ordered without an explicit ordering clause?

2023-09-18 Thread Mich Talebzadeh

These are good points. In traditional RDBMSs, SQL query results without an
explicit *ORDER BY* clause may vary in order due to optimization,
especially when no clustered index is defined. In contrast, systems like
Hive and Spark SQL, which are based on distributed file storage, do not
rely on physical data order (co-location of data blocks). They deploy
techniques like columnar storage and predicate pushdown instead of
traditional indexing due to the distributed nature of their storage
systems.

HTH


On Mon, 18 Sept 2023 at 20:19, Sean Owen  wrote:

> I think it's the same, and always has been - yes you don't have a
> guaranteed ordering unless an operation produces a specific ordering. Could
> be the result of order by, yes; I believe you would be guaranteed that
> reading input files results in data in the order they appear in the file,
> etc. 1:1 operations like map() don't change ordering. But not the result of
> a shuffle, for example. So yeah anything like limit or head might give
> different results in the future (or simply on different cluster setups with
> different parallelism, etc). The existence of operations like offset
> doesn't contradict that. Maybe that's totally fine in some situations (ex:
> I just want to display some sample rows) but otherwise yeah you've always
> had to state your ordering for "first" or "nth" to have a guaranteed result.
>
> On Mon, Sep 18, 2023 at 10:48 AM Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> I’ve always considered DataFrames to be logically equivalent to SQL
>> tables or queries.
>>
>> In SQL, the result order of any query is implementation-dependent without
>> an explicit ORDER BY clause. Technically, you could run `SELECT * FROM
>> table;` 10 times in a row and get 10 different orderings.
>>
>> I thought the same applied to DataFrames, but the docstring for the
>> recently added method DataFrame.offset
>> 
>>  implies
>> otherwise.
>>
>> This example will work fine in practice, of course. But if DataFrames are
>> technically unordered without an explicit ordering clause, then in theory a
>> future implementation change may result in “Bob" being the “first” row in
>> the DataFrame, rather than “Tom”. That would make the example incorrect.
>>
>> Is that not the case?
>>
>> Nick
>>
>>

Re: Are DataFrame rows ordered without an explicit ordering clause?

2023-09-18 Thread Mich Talebzadeh

Hi Nicholas,

Your point

"In SQL, the result order of any query is implementation-dependent without
an explicit ORDER BY clause. Technically, you could run `SELECT * FROM
table;` 10 times in a row and get 10 different orderings."

yes I concur my understanding is the same.

In SQL, the result order of any query is implementation-dependent without
an explicit ORDER BY clause. Basically this means that the database engine
is free to return the results in any order that it sees fit. This is
because SQL does not guarantee a specific order for results unless an ORDER
BY clause is used.

HTH

Mich Talebzadeh,
Distinguished Technologist, Solutions Architect & Engineer
London
United Kingdom

   view my Linkedin profile

 https://en.everybodywiki.com/Mich_Talebzadeh

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

On Mon, 18 Sept 2023 at 16:58, Reynold Xin 
wrote:

> It should be the same as SQL. Otherwise it takes away a lot of potential
> future optimization opportunities.
>
>
> On Mon, Sep 18 2023 at 8:47 AM, Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> I’ve always considered DataFrames to be logically equivalent to SQL
>> tables or queries.
>>
>> In SQL, the result order of any query is implementation-dependent without
>> an explicit ORDER BY clause. Technically, you could run `SELECT * FROM
>> table;` 10 times in a row and get 10 different orderings.
>>
>> I thought the same applied to DataFrames, but the docstring for the
>> recently added method DataFrame.offset
>> 
>>  implies
>> otherwise.
>>
>> This example will work fine in practice, of course. But if DataFrames are
>> technically unordered without an explicit ordering clause, then in theory a
>> future implementation change may result in “Bob" being the “first” row in
>> the DataFrame, rather than “Tom”. That would make the example incorrect.
>>
>> Is that not the case?
>>
>> Nick
>>
>

Re: Are DataFrame rows ordered without an explicit ordering clause?

2023-09-18 Thread Reynold Xin

It should be the same as SQL. Otherwise it takes away a lot of potential future 
optimization opportunities.

On Mon, Sep 18 2023 at 8:47 AM, Nicholas Chammas < nicholas.cham...@gmail.com > 
wrote:

> 
> I’ve always considered DataFrames to be logically equivalent to SQL tables
> or queries.
> 
> 
> In SQL, the result order of any query is implementation-dependent without
> an explicit ORDER BY clause. Technically, you could run `SELECT * FROM
> table;` 10 times in a row and get 10 different orderings.
> 
> 
> I thought the same applied to DataFrames, but the docstring for the
> recently added method DataFrame.offset (
> https://github.com/apache/spark/pull/40873/files#diff-4ff57282598a3b9721b8d6f8c2fea23a62e4bc3c0f1aa5444527549d1daa38baR1293-R1301
> ) implies otherwise.
> 
> 
> This example will work fine in practice, of course. But if DataFrames are
> technically unordered without an explicit ordering clause, then in theory
> a future implementation change may result in “Bob" being the “first” row
> in the DataFrame, rather than “Tom”. That would make the example
> incorrect.
> 
> 
> Is that not the case?
> 
> 
> Nick
>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: Are DataFrame rows ordered without an explicit ordering clause?

2023-09-18 Thread Sean Owen

I think it's the same, and always has been - yes you don't have a
guaranteed ordering unless an operation produces a specific ordering. Could
be the result of order by, yes; I believe you would be guaranteed that
reading input files results in data in the order they appear in the file,
etc. 1:1 operations like map() don't change ordering. But not the result of
a shuffle, for example. So yeah anything like limit or head might give
different results in the future (or simply on different cluster setups with
different parallelism, etc). The existence of operations like offset
doesn't contradict that. Maybe that's totally fine in some situations (ex:
I just want to display some sample rows) but otherwise yeah you've always
had to state your ordering for "first" or "nth" to have a guaranteed result.

On Mon, Sep 18, 2023 at 10:48 AM Nicholas Chammas <
nicholas.cham...@gmail.com> wrote:

> I’ve always considered DataFrames to be logically equivalent to SQL tables
> or queries.
>
> In SQL, the result order of any query is implementation-dependent without
> an explicit ORDER BY clause. Technically, you could run `SELECT * FROM
> table;` 10 times in a row and get 10 different orderings.
>
> I thought the same applied to DataFrames, but the docstring for the
> recently added method DataFrame.offset
> 
>  implies
> otherwise.
>
> This example will work fine in practice, of course. But if DataFrames are
> technically unordered without an explicit ordering clause, then in theory a
> future implementation change may result in “Bob" being the “first” row in
> the DataFrame, rather than “Tom”. That would make the example incorrect.
>
> Is that not the case?
>
> Nick
>
>

Are DataFrame rows ordered without an explicit ordering clause?

2023-09-18 Thread Nicholas Chammas

I’ve always considered DataFrames to be logically equivalent to SQL tables or 
queries.

In SQL, the result order of any query is implementation-dependent without an 
explicit ORDER BY clause. Technically, you could run `SELECT * FROM table;` 10 
times in a row and get 10 different orderings.

I thought the same applied to DataFrames, but the docstring for the recently 
added method DataFrame.offset 

 implies otherwise.

This example will work fine in practice, of course. But if DataFrames are 
technically unordered without an explicit ordering clause, then in theory a 
future implementation change may result in “Bob" being the “first” row in the 
DataFrame, rather than “Tom”. That would make the example incorrect.

Is that not the case?

Nick

Re: [ANNOUNCE] Apache Spark 3.5.0 released

2023-09-18 Thread Ruifeng Zheng

Thanks Yuanjian for driving this release, Congratulations!

On Mon, Sep 18, 2023 at 2:16 PM Maxim Gekk
 wrote:

> Thank you for the work, Yuanjian!
>
> On Mon, Sep 18, 2023 at 6:28 AM beliefer  wrote:
>
>> Congratulations! Apache Spark.
>>
>>
>>
>> At 2023-09-16 01:01:40, "Yuanjian Li"  wrote:
>>
>> Hi All,
>>
>> We are happy to announce the availability of *Apache Spark 3.5.0*!
>>
>> Apache Spark 3.5.0 is the sixth release of the 3.x line.
>>
>> To download Spark 3.5.0, head over to the download page:
>> https://spark.apache.org/downloads.html
>> (Please note: the PyPi upload is pending due to a size limit request;
>> we're actively following up here
>>  with the PyPi organization)
>>
>> To view the release notes:
>> https://spark.apache.org/releases/spark-release-3-5-0.html
>>
>> We would like to acknowledge all community members for contributing to
>> this
>> release. This release would not have been possible without you.
>>
>> Best,
>> Yuanjian
>>
>>

-- 
Ruifeng Zheng
E-mail: zrfli...@gmail.com

Re: [ANNOUNCE] Apache Spark 3.5.0 released

2023-09-18 Thread Maxim Gekk

Thank you for the work, Yuanjian!

On Mon, Sep 18, 2023 at 6:28 AM beliefer  wrote:

> Congratulations! Apache Spark.
>
>
>
> At 2023-09-16 01:01:40, "Yuanjian Li"  wrote:
>
> Hi All,
>
> We are happy to announce the availability of *Apache Spark 3.5.0*!
>
> Apache Spark 3.5.0 is the sixth release of the 3.x line.
>
> To download Spark 3.5.0, head over to the download page:
> https://spark.apache.org/downloads.html
> (Please note: the PyPi upload is pending due to a size limit request;
> we're actively following up here
>  with the PyPi organization)
>
> To view the release notes:
> https://spark.apache.org/releases/spark-release-3-5-0.html
>
> We would like to acknowledge all community members for contributing to this
> release. This release would not have been possible without you.
>
> Best,
> Yuanjian
>
>

Re:[ANNOUNCE] Apache Spark 3.5.0 released

2023-09-17 Thread beliefer

Congratulations! Apache Spark. 







At 2023-09-16 01:01:40, "Yuanjian Li"  wrote:

Hi All,


We are happy to announce the availability of Apache Spark 3.5.0!

Apache Spark 3.5.0 is the sixth release of the 3.x line.

To download Spark 3.5.0, head over to the download page:
https://spark.apache.org/downloads.html
(Please note: the PyPi upload is pending due to a size limit request; we're 
actively following up here with the PyPi organization)

To view the release notes:
https://spark.apache.org/releases/spark-release-3-5-0.html

We would like to acknowledge all community members for contributing to this
release. This release would not have been possible without you.

Best,
Yuanjian

Re: First Time contribution.

2023-09-17 Thread Haejoon Lee

Welcome Ram! :-)

I would recommend you to check
https://issues.apache.org/jira/browse/SPARK-37935 out as a starter task.

Refer to https://github.com/apache/spark/pull/41504,
https://github.com/apache/spark/pull/41455 as an example PR.

Or you can also add a new sub-task if you find any error messages that need
improvement.

Thanks!

On Mon, Sep 18, 2023 at 9:33 AM Denny Lee  wrote:

> Hi Ram,
>
> We have some good guidance at
> https://spark.apache.org/contributing.html
>
> HTH!
> Denny
>
>
> On Sun, Sep 17, 2023 at 17:18 ram manickam  wrote:
>
>>
>>
>>
>> Hello All,
>> Recently, joined this community and would like to contribute. Is there a
>> guideline or recommendation on tasks that can be picked up by a first timer
>> or a started task?.
>>
>> Tried looking at stack overflow tag: apache-spark
>> , couldn't find
>> any information for first time contributors.
>>
>> Looking forward to learning and contributing.
>>
>> Thanks
>> Ram
>>
>

Re: First Time contribution.

2023-09-17 Thread Denny Lee

Hi Ram,

We have some good guidance at
https://spark.apache.org/contributing.html

HTH!
Denny


On Sun, Sep 17, 2023 at 17:18 ram manickam  wrote:

>
>
>
> Hello All,
> Recently, joined this community and would like to contribute. Is there a
> guideline or recommendation on tasks that can be picked up by a first timer
> or a started task?.
>
> Tried looking at stack overflow tag: apache-spark
> , couldn't find
> any information for first time contributors.
>
> Looking forward to learning and contributing.
>
> Thanks
> Ram
>

[ANNOUNCE] Apache Spark 3.5.0 released

2023-09-15 Thread Yuanjian Li

Hi All,

We are happy to announce the availability of *Apache Spark 3.5.0*!

Apache Spark 3.5.0 is the sixth release of the 3.x line.

To download Spark 3.5.0, head over to the download page:
https://spark.apache.org/downloads.html
(Please note: the PyPi upload is pending due to a size limit request; we're
actively following up here 
with the PyPi organization)

To view the release notes:
https://spark.apache.org/releases/spark-release-3-5-0.html

We would like to acknowledge all community members for contributing to this
release. This release would not have been possible without you.

Best,
Yuanjian

Re: Plans for built-in v2 data sources in Spark 4

2023-09-14 Thread Dongjoon Hyun

Hi, Will.

According to the following JIRA, as of now, there is no plan or on-going
discussion to switch it.

https://issues.apache.org/jira/browse/SPARK-44111 (Prepare Apache Spark
4.0.0)

Thanks,
Dongjoon.


On Wed, Sep 13, 2023 at 9:02 AM Will Raschkowski
 wrote:

> Hey everyone,
>
>
>
> I was wondering what the plans are for Spark's built-in v2 file data
> sources in Spark 4.
>
>
>
> Concretely, is the plan for Spark 4 to continue defaulting to the built-in
> v1 data sources? And if yes, what are the blockers for defaulting to v2? I
> see, just as example, that writing Hive-partitions is not supported in v2.
> Are there other blockers or outstanding discussions?
>
>
>
> Regards,
>
> Will
>
>
>

Re: Write Spark Connection client application in Go

2023-09-14 Thread bo yang

Thanks Holden and Martin for the nice words and feedback :)

On Wed, Sep 13, 2023 at 8:22 AM Martin Grund  wrote:

> This is absolutely awesome! Thank you so much for dedicating your time to
> this project!
>
>
> On Wed, Sep 13, 2023 at 6:04 AM Holden Karau  wrote:
>
>> That’s so cool! Great work y’all :)
>>
>> On Tue, Sep 12, 2023 at 8:14 PM bo yang  wrote:
>>
>>> Hi Spark Friends,
>>>
>>> Anyone interested in using Golang to write Spark application? We created
>>> a Spark Connect Go Client library
>>> . Would love to hear
>>> feedback/thoughts from the community.
>>>
>>> Please see the quick start guide
>>> 
>>> about how to use it. Following is a very short Spark Connect application in
>>> Go:
>>>
>>> func main() {
>>> spark, _ := 
>>> sql.SparkSession.Builder.Remote("sc://localhost:15002").Build()
>>> defer spark.Stop()
>>>
>>> df, _ := spark.Sql("select 'apple' as word, 123 as count union all 
>>> select 'orange' as word, 456 as count")
>>> df.Show(100, false)
>>> df.Collect()
>>>
>>> df.Write().Mode("overwrite").
>>> Format("parquet").
>>> Save("file:///tmp/spark-connect-write-example-output.parquet")
>>>
>>> df = spark.Read().Format("parquet").
>>> Load("file:///tmp/spark-connect-write-example-output.parquet")
>>> df.Show(100, false)
>>>
>>> df.CreateTempView("view1", true, false)
>>> df, _ = spark.Sql("select count, word from view1 order by count")
>>> }
>>>
>>>
>>> Many thanks to Martin, Hyukjin, Ruifeng and Denny for creating and
>>> working together on this repo! Welcome more people to contribute :)
>>>
>>> Best,
>>> Bo
>>>
>>>

Plans for built-in v2 data sources in Spark 4

2023-09-13 Thread Will Raschkowski

Hey everyone,

I was wondering what the plans are for Spark's built-in v2 file data sources in 
Spark 4.

Concretely, is the plan for Spark 4 to continue defaulting to the built-in v1 
data sources? And if yes, what are the blockers for defaulting to v2? I see, 
just as example, that writing Hive-partitions is not supported in v2. Are there 
other blockers or outstanding discussions?

Regards,
Will

Re: Write Spark Connection client application in Go

2023-09-13 Thread Martin Grund

This is absolutely awesome! Thank you so much for dedicating your time to
this project!


On Wed, Sep 13, 2023 at 6:04 AM Holden Karau  wrote:

> That’s so cool! Great work y’all :)
>
> On Tue, Sep 12, 2023 at 8:14 PM bo yang  wrote:
>
>> Hi Spark Friends,
>>
>> Anyone interested in using Golang to write Spark application? We created
>> a Spark Connect Go Client library
>> . Would love to hear
>> feedback/thoughts from the community.
>>
>> Please see the quick start guide
>> 
>> about how to use it. Following is a very short Spark Connect application in
>> Go:
>>
>> func main() {
>>  spark, _ := 
>> sql.SparkSession.Builder.Remote("sc://localhost:15002").Build()
>>  defer spark.Stop()
>>
>>  df, _ := spark.Sql("select 'apple' as word, 123 as count union all 
>> select 'orange' as word, 456 as count")
>>  df.Show(100, false)
>>  df.Collect()
>>
>>  df.Write().Mode("overwrite").
>>  Format("parquet").
>>  Save("file:///tmp/spark-connect-write-example-output.parquet")
>>
>>  df = spark.Read().Format("parquet").
>>  Load("file:///tmp/spark-connect-write-example-output.parquet")
>>  df.Show(100, false)
>>
>>  df.CreateTempView("view1", true, false)
>>  df, _ = spark.Sql("select count, word from view1 order by count")
>> }
>>
>>
>> Many thanks to Martin, Hyukjin, Ruifeng and Denny for creating and
>> working together on this repo! Welcome more people to contribute :)
>>
>> Best,
>> Bo
>>
>>

unsubscribe

2023-09-13 Thread ankur

Re: Write Spark Connection client application in Go

2023-09-12 Thread Holden Karau

That’s so cool! Great work y’all :)

On Tue, Sep 12, 2023 at 8:14 PM bo yang  wrote:

> Hi Spark Friends,
>
> Anyone interested in using Golang to write Spark application? We created a 
> Spark
> Connect Go Client library .
> Would love to hear feedback/thoughts from the community.
>
> Please see the quick start guide
> 
> about how to use it. Following is a very short Spark Connect application in
> Go:
>
> func main() {
>   spark, _ := 
> sql.SparkSession.Builder.Remote("sc://localhost:15002").Build()
>   defer spark.Stop()
>
>   df, _ := spark.Sql("select 'apple' as word, 123 as count union all 
> select 'orange' as word, 456 as count")
>   df.Show(100, false)
>   df.Collect()
>
>   df.Write().Mode("overwrite").
>   Format("parquet").
>   Save("file:///tmp/spark-connect-write-example-output.parquet")
>
>   df = spark.Read().Format("parquet").
>   Load("file:///tmp/spark-connect-write-example-output.parquet")
>   df.Show(100, false)
>
>   df.CreateTempView("view1", true, false)
>   df, _ = spark.Sql("select count, word from view1 order by count")
> }
>
>
> Many thanks to Martin, Hyukjin, Ruifeng and Denny for creating and working
> together on this repo! Welcome more people to contribute :)
>
> Best,
> Bo
>
>

unsubscribe

2023-09-12 Thread 杨军

unsubscribe

Write Spark Connection client application in Go

2023-09-12 Thread bo yang

Hi Spark Friends,

Anyone interested in using Golang to write Spark application? We
created a Spark
Connect Go Client library .
Would love to hear feedback/thoughts from the community.

Please see the quick start guide

about how to use it. Following is a very short Spark Connect application in
Go:

func main() {
spark, _ := 
sql.SparkSession.Builder.Remote("sc://localhost:15002").Build()
defer spark.Stop()

df, _ := spark.Sql("select 'apple' as word, 123 as count union all
select 'orange' as word, 456 as count")
df.Show(100, false)
df.Collect()

df.Write().Mode("overwrite").
Format("parquet").
Save("file:///tmp/spark-connect-write-example-output.parquet")

df = spark.Read().Format("parquet").
Load("file:///tmp/spark-connect-write-example-output.parquet")
df.Show(100, false)

df.CreateTempView("view1", true, false)
df, _ = spark.Sql("select count, word from view1 order by count")
}


Many thanks to Martin, Hyukjin, Ruifeng and Denny for creating and working
together on this repo! Welcome more people to contribute :)

Best,
Bo

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-12 Thread XiDuo You

+1 (non-binding)

Jungtaek Lim  于2023年9月12日周二 15:14写道：
>
> +1 (non-binding)
>
> Thanks for driving this release and the patience on multiple RCs!
>
> On Tue, Sep 12, 2023 at 10:00 AM Yuanjian Li  wrote:
>>
>> +1 (non-binding)
>>
>> Yuanjian Li  于2023年9月11日周一 09:36写道：
>>>
>>> @Peter Toth I've looked into the details of this issue, and it appears that 
>>> it's neither a regression in version 3.5.0 nor a correctness issue. It's a 
>>> bug related to a new feature. I think we can fix this in 3.5.1 and list it 
>>> as a known issue of the Scala client of Spark Connect in 3.5.0.
>>>
>>> Mridul Muralidharan  于2023年9月10日周日 04:12写道：


 +1

 Signatures, digests, etc check out fine.
 Checked out tag and build/tested with -Phive -Pyarn -Pmesos -Pkubernetes

 Regards,
 Mridul

 On Sat, Sep 9, 2023 at 10:02 AM Yuanjian Li  wrote:
>
> Please vote on releasing the following candidate(RC5) as Apache Spark 
> version 3.5.0.
>
>
> The vote is open until 11:59pm Pacific time Sep 11th and passes if a 
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
>
> [ ] +1 Release this package as Apache Spark 3.5.0
>
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
>
> The tag to be voted on is v3.5.0-rc5 (commit 
> ce5ddad990373636e94071e7cef2f31021add07b):
>
> https://github.com/apache/spark/tree/v3.5.0-rc5
>
>
> The release files, including signatures, digests, etc. can be found at:
>
> https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-bin/
>
>
> Signatures used for Spark RCs can be found in this file:
>
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
>
> The staging repository for this release can be found at:
>
> https://repository.apache.org/content/repositories/orgapachespark-1449
>
>
> The documentation corresponding to this release can be found at:
>
> https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-docs/
>
>
> The list of bug fixes going into 3.5.0 can be found at the following URL:
>
> https://issues.apache.org/jira/projects/SPARK/versions/12352848
>
>
> This release is using the release script of the tag v3.5.0-rc5.
>
>
>
> FAQ
>
>
> =
>
> How can I help test this release?
>
> =
>
> If you are a Spark user, you can help us test this release by taking
>
> an existing Spark workload and running on this release candidate, then
>
> reporting any regressions.
>
>
> If you're working in PySpark you can set up a virtual env and install
>
> the current RC and see if anything important breaks, in the Java/Scala
>
> you can add the staging repository to your projects resolvers and test
>
> with the RC (make sure to clean up the artifact cache before/after so
>
> you don't end up building with an out of date RC going forward).
>
>
> ===
>
> What should happen to JIRA tickets still targeting 3.5.0?
>
> ===
>
> The current list of open tickets targeted at 3.5.0 can be found at:
>
> https://issues.apache.org/jira/projects/SPARK and search for "Target 
> Version/s" = 3.5.0
>
>
> Committers should look at those and triage. Extremely important bug
>
> fixes, documentation, and API tweaks that impact compatibility should
>
> be worked on immediately. Everything else please retarget to an
>
> appropriate release.
>
>
> ==
>
> But my bug isn't fixed?
>
> ==
>
> In order to make timely releases, we will typically not hold the
>
> release unless the bug in question is a regression from the previous
>
> release. That being said, if there is something which is a regression
>
> that has not been correctly targeted please ping me or a committer to
>
> help target the issue.
>
>
> Thanks,
>
> Yuanjian Li

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

[VOTE][RESULT] Release Apache Spark 3.5.0 (RC5)

2023-09-12 Thread Yuanjian Li

The vote passes with 13 +1s (8 binding +1s).
Thank you all who helped with the release!

(* = binding)
+1:
- Mridul Muralidharan (*)
- Yuanjian Li
- Xiao Li (*)
- Gengliang Wang (*)
- Hyukjin Kwon (*)
- Ruifeng Zheng (*)
- Jungtaek Lim
- Wenchen Fan (*)
- Jia Fan
- Jie Yang
- Yuming Wang (*)
- Kent Yao
- Dongjoon Hyun (*)

+0: None

-1: None

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-12 Thread Dongjoon Hyun

+1

Dongjoon.

On 2023/09/12 03:38:37 Kent Yao wrote:
> +1 (non-binding), great work!
> 
> Kent Yao
> 
> Yuming Wang  于2023年9月12日周二 11:32写道：
> >
> > +1.
> >
> > On Tue, Sep 12, 2023 at 10:57 AM yangjie01  
> > wrote:
> >>
> >> +1
> >>
> >>
> >>
> >> 发件人: Jia Fan 
> >> 日期: 2023年9月12日 星期二 10:08
> >> 收件人: Ruifeng Zheng 
> >> 抄送: Hyukjin Kwon , Xiao Li , 
> >> Mridul Muralidharan , Peter Toth , 
> >> Spark dev list , Yuanjian Li 
> >> 主题: Re: [VOTE] Release Apache Spark 3.5.0 (RC5)
> >>
> >>
> >>
> >> +1
> >>
> >>
> >>
> >> Ruifeng Zheng  于2023年9月12日周二 08:46写道：
> >>
> >> +1
> >>
> >>
> >>
> >> On Tue, Sep 12, 2023 at 7:24 AM Hyukjin Kwon  wrote:
> >>
> >> +1
> >>
> >>
> >>
> >> On Tue, Sep 12, 2023 at 7:05 AM Xiao Li  wrote:
> >>
> >> +1
> >>
> >>
> >>
> >> Xiao
> >>
> >>
> >>
> >> Yuanjian Li  于2023年9月11日周一 10:53写道：
> >>
> >> @Peter Toth I've looked into the details of this issue, and it appears 
> >> that it's neither a regression in version 3.5.0 nor a correctness issue. 
> >> It's a bug related to a new feature. I think we can fix this in 3.5.1 and 
> >> list it as a known issue of the Scala client of Spark Connect in 3.5.0.
> >>
> >> Mridul Muralidharan  于2023年9月10日周日 04:12写道：
> >>
> >>
> >>
> >> +1
> >>
> >>
> >>
> >> Signatures, digests, etc check out fine.
> >>
> >> Checked out tag and build/tested with -Phive -Pyarn -Pmesos -Pkubernetes
> >>
> >>
> >>
> >> Regards,
> >>
> >> Mridul
> >>
> >>
> >>
> >> On Sat, Sep 9, 2023 at 10:02 AM Yuanjian Li  wrote:
> >>
> >> Please vote on releasing the following candidate(RC5) as Apache Spark 
> >> version 3.5.0.
> >>
> >>
> >>
> >> The vote is open until 11:59pm Pacific time Sep 11th and passes if a 
> >> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> >>
> >>
> >>
> >> [ ] +1 Release this package as Apache Spark 3.5.0
> >>
> >> [ ] -1 Do not release this package because ...
> >>
> >>
> >>
> >> To learn more about Apache Spark, please see http://spark.apache.org/
> >>
> >>
> >>
> >> The tag to be voted on is v3.5.0-rc5 (commit 
> >> ce5ddad990373636e94071e7cef2f31021add07b):
> >>
> >> https://github.com/apache/spark/tree/v3.5.0-rc5
> >>
> >>
> >>
> >> The release files, including signatures, digests, etc. can be found at:
> >>
> >> https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-bin/
> >>
> >>
> >>
> >> Signatures used for Spark RCs can be found in this file:
> >>
> >> https://dist.apache.org/repos/dist/dev/spark/KEYS
> >>
> >>
> >>
> >> The staging repository for this release can be found at:
> >>
> >> https://repository.apache.org/content/repositories/orgapachespark-1449
> >>
> >>
> >>
> >> The documentation corresponding to this release can be found at:
> >>
> >> https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-docs/
> >>
> >>
> >>
> >> The list of bug fixes going into 3.5.0 can be found at the following URL:
> >>
> >> https://issues.apache.org/jira/projects/SPARK/versions/12352848
> >>
> >>
> >>
> >> This release is using the release script of the tag v3.5.0-rc5.
> >>
> >>
> >>
> >> FAQ
> >>
> >>
> >>
> >> =
> >>
> >> How can I help test this release?
> >>
> >> =
> >>
> >> If you are a Spark user, you can help us test this release by taking
> >>
> >> an existing Spark workload and running on this release candidate, then
> >>
> >> reporting any regressions.
> >>
> >>
> >>
> >> If you're working in PySpark you can set up a virtual env and install
> >>
> >> the current RC and see if anything important breaks, in the Java/Scala
> >>
> >> you can add the staging repository to your projects resolvers and test
> >>
> >> with the RC (make sure to clean up the artifact cache before/after so
> >>
> >> you don't end up building with an out of date RC going forward).
> >>
> >>
> >>
> >> ===
> >>
> >> What should happen to JIRA tickets still targeting 3.5.0?
> >>
> >> ===
> >>
> >> The current list of open tickets targeted at 3.5.0 can be found at:
> >>
> >> https://issues.apache.org/jira/projects/SPARK and search for "Target 
> >> Version/s" = 3.5.0
> >>
> >>
> >>
> >> Committers should look at those and triage. Extremely important bug
> >>
> >> fixes, documentation, and API tweaks that impact compatibility should
> >>
> >> be worked on immediately. Everything else please retarget to an
> >>
> >> appropriate release.
> >>
> >>
> >>
> >> ==
> >>
> >> But my bug isn't fixed?
> >>
> >> ==
> >>
> >> In order to make timely releases, we will typically not hold the
> >>
> >> release unless the bug in question is a regression from the previous
> >>
> >> release. That being said, if there is something which is a regression
> >>
> >> that has not been correctly targeted please ping me or a committer to
> >>
> >> help target the issue.
> >>
> >>
> >>
> >> Thanks,
> >>
> >> Yuanjian Li
> 
> -
> To unsubscribe e-mail:

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-11 Thread Kent Yao

+1 (non-binding), great work!

Kent Yao

Yuming Wang  于2023年9月12日周二 11:32写道：
>
> +1.
>
> On Tue, Sep 12, 2023 at 10:57 AM yangjie01  
> wrote:
>>
>> +1
>>
>>
>>
>> 发件人: Jia Fan 
>> 日期: 2023年9月12日 星期二 10:08
>> 收件人: Ruifeng Zheng 
>> 抄送: Hyukjin Kwon , Xiao Li , 
>> Mridul Muralidharan , Peter Toth , 
>> Spark dev list , Yuanjian Li 
>> 主题: Re: [VOTE] Release Apache Spark 3.5.0 (RC5)
>>
>>
>>
>> +1
>>
>>
>>
>> Ruifeng Zheng  于2023年9月12日周二 08:46写道：
>>
>> +1
>>
>>
>>
>> On Tue, Sep 12, 2023 at 7:24 AM Hyukjin Kwon  wrote:
>>
>> +1
>>
>>
>>
>> On Tue, Sep 12, 2023 at 7:05 AM Xiao Li  wrote:
>>
>> +1
>>
>>
>>
>> Xiao
>>
>>
>>
>> Yuanjian Li  于2023年9月11日周一 10:53写道：
>>
>> @Peter Toth I've looked into the details of this issue, and it appears that 
>> it's neither a regression in version 3.5.0 nor a correctness issue. It's a 
>> bug related to a new feature. I think we can fix this in 3.5.1 and list it 
>> as a known issue of the Scala client of Spark Connect in 3.5.0.
>>
>> Mridul Muralidharan  于2023年9月10日周日 04:12写道：
>>
>>
>>
>> +1
>>
>>
>>
>> Signatures, digests, etc check out fine.
>>
>> Checked out tag and build/tested with -Phive -Pyarn -Pmesos -Pkubernetes
>>
>>
>>
>> Regards,
>>
>> Mridul
>>
>>
>>
>> On Sat, Sep 9, 2023 at 10:02 AM Yuanjian Li  wrote:
>>
>> Please vote on releasing the following candidate(RC5) as Apache Spark 
>> version 3.5.0.
>>
>>
>>
>> The vote is open until 11:59pm Pacific time Sep 11th and passes if a 
>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>
>>
>>
>> [ ] +1 Release this package as Apache Spark 3.5.0
>>
>> [ ] -1 Do not release this package because ...
>>
>>
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>>
>>
>> The tag to be voted on is v3.5.0-rc5 (commit 
>> ce5ddad990373636e94071e7cef2f31021add07b):
>>
>> https://github.com/apache/spark/tree/v3.5.0-rc5
>>
>>
>>
>> The release files, including signatures, digests, etc. can be found at:
>>
>> https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-bin/
>>
>>
>>
>> Signatures used for Spark RCs can be found in this file:
>>
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>>
>>
>> The staging repository for this release can be found at:
>>
>> https://repository.apache.org/content/repositories/orgapachespark-1449
>>
>>
>>
>> The documentation corresponding to this release can be found at:
>>
>> https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-docs/
>>
>>
>>
>> The list of bug fixes going into 3.5.0 can be found at the following URL:
>>
>> https://issues.apache.org/jira/projects/SPARK/versions/12352848
>>
>>
>>
>> This release is using the release script of the tag v3.5.0-rc5.
>>
>>
>>
>> FAQ
>>
>>
>>
>> =
>>
>> How can I help test this release?
>>
>> =
>>
>> If you are a Spark user, you can help us test this release by taking
>>
>> an existing Spark workload and running on this release candidate, then
>>
>> reporting any regressions.
>>
>>
>>
>> If you're working in PySpark you can set up a virtual env and install
>>
>> the current RC and see if anything important breaks, in the Java/Scala
>>
>> you can add the staging repository to your projects resolvers and test
>>
>> with the RC (make sure to clean up the artifact cache before/after so
>>
>> you don't end up building with an out of date RC going forward).
>>
>>
>>
>> ===
>>
>> What should happen to JIRA tickets still targeting 3.5.0?
>>
>> ===
>>
>> The current list of open tickets targeted at 3.5.0 can be found at:
>>
>> https://issues.apache.org/jira/projects/SPARK and search for "Target 
>> Version/s" = 3.5.0
>>
>>
>>
>> Committers should look at those and triage. Extremely important bug
>>
>> fixes, documentation, and API tweaks that impact compatibility should
>>
>> be worked on immediately. Everything else please retarget to an
>>
>> appropriate release.
>>
>>
>>
>> ==
>>
>> But my bug isn't fixed?
>>
>> ==
>>
>> In order to make timely releases, we will typically not hold the
>>
>> release unless the bug in question is a regression from the previous
>>
>> release. That being said, if there is something which is a regression
>>
>> that has not been correctly targeted please ping me or a committer to
>>
>> help target the issue.
>>
>>
>>
>> Thanks,
>>
>> Yuanjian Li

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-11 Thread Yuming Wang

+1.

On Tue, Sep 12, 2023 at 10:57 AM yangjie01 
wrote:

> +1
>
>
>
> *发件人**: *Jia Fan 
> *日期**: *2023年9月12日 星期二 10:08
> *收件人**: *Ruifeng Zheng 
> *抄送**: *Hyukjin Kwon , Xiao Li ,
> Mridul Muralidharan , Peter Toth ,
> Spark dev list , Yuanjian Li  >
> *主题**: *Re: [VOTE] Release Apache Spark 3.5.0 (RC5)
>
>
>
> +1
>
>
>
> Ruifeng Zheng  于2023年9月12日周二 08:46写道：
>
> +1
>
>
>
> On Tue, Sep 12, 2023 at 7:24 AM Hyukjin Kwon  wrote:
>
> +1
>
>
>
> On Tue, Sep 12, 2023 at 7:05 AM Xiao Li  wrote:
>
> +1
>
>
>
> Xiao
>
>
>
> Yuanjian Li  于2023年9月11日周一 10:53写道：
>
> @Peter Toth  I've looked into the details of this
> issue, and it appears that it's neither a regression in version 3.5.0 nor a
> correctness issue. It's a bug related to a new feature. I think we can fix
> this in 3.5.1 and list it as a known issue of the Scala client of Spark
> Connect in 3.5.0.
>
> Mridul Muralidharan  于2023年9月10日周日 04:12写道：
>
>
>
> +1
>
>
>
> Signatures, digests, etc check out fine.
>
> Checked out tag and build/tested with -Phive -Pyarn -Pmesos -Pkubernetes
>
>
>
> Regards,
>
> Mridul
>
>
>
> On Sat, Sep 9, 2023 at 10:02 AM Yuanjian Li 
> wrote:
>
> Please vote on releasing the following candidate(RC5) as Apache Spark
> version 3.5.0.
>
>
>
> The vote is open until 11:59pm Pacific time *Sep 11th* and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
>
>
> [ ] +1 Release this package as Apache Spark 3.5.0
>
> [ ] -1 Do not release this package because ...
>
>
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
>
>
> The tag to be voted on is v3.5.0-rc5 (commit
> ce5ddad990373636e94071e7cef2f31021add07b):
>
> https://github.com/apache/spark/tree/v3.5.0-rc5
>
>
>
> The release files, including signatures, digests, etc. can be found at:
>
> https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-bin/
>
>
>
> Signatures used for Spark RCs can be found in this file:
>
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
>
>
> The staging repository for this release can be found at:
>
> https://repository.apache.org/content/repositories/orgapachespark-1449
>
>
>
> The documentation corresponding to this release can be found at:
>
> https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-docs/
>
>
>
> The list of bug fixes going into 3.5.0 can be found at the following URL:
>
> https://issues.apache.org/jira/projects/SPARK/versions/12352848
>
>
>
> This release is using the release script of the tag v3.5.0-rc5.
>
>
>
> FAQ
>
>
>
> =
>
> How can I help test this release?
>
> =
>
> If you are a Spark user, you can help us test this release by taking
>
> an existing Spark workload and running on this release candidate, then
>
> reporting any regressions.
>
>
>
> If you're working in PySpark you can set up a virtual env and install
>
> the current RC and see if anything important breaks, in the Java/Scala
>
> you can add the staging repository to your projects resolvers and test
>
> with the RC (make sure to clean up the artifact cache before/after so
>
> you don't end up building with an out of date RC going forward).
>
>
>
> ===
>
> What should happen to JIRA tickets still targeting 3.5.0?
>
> ===
>
> The current list of open tickets targeted at 3.5.0 can be found at:
>
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.5.0
>
>
>
> Committers should look at those and triage. Extremely important bug
>
> fixes, documentation, and API tweaks that impact compatibility should
>
> be worked on immediately. Everything else please retarget to an
>
> appropriate release.
>
>
>
> ==
>
> But my bug isn't fixed?
>
> ==
>
> In order to make timely releases, we will typically not hold the
>
> release unless the bug in question is a regression from the previous
>
> release. That being said, if there is something which is a regression
>
> that has not been correctly targeted please ping me or a committer to
>
> help target the issue.
>
>
>
> Thanks,
>
> Yuanjian Li
>
>

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-11 Thread yangjie01

+1

发件人: Jia Fan 
日期: 2023年9月12日 星期二 10:08
收件人: Ruifeng Zheng 
抄送: Hyukjin Kwon , Xiao Li , Mridul 
Muralidharan , Peter Toth , Spark dev 
list , Yuanjian Li 
主题: Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

+1

Ruifeng Zheng mailto:ruife...@apache.org>> 于2023年9月12日周二 
08:46写道：
+1

On Tue, Sep 12, 2023 at 7:24 AM Hyukjin Kwon 
mailto:gurwls...@apache.org>> wrote:
+1

On Tue, Sep 12, 2023 at 7:05 AM Xiao Li 
mailto:gatorsm...@gmail.com>> wrote:
+1

Xiao

Yuanjian Li mailto:xyliyuanj...@gmail.com>> 
于2023年9月11日周一 10:53写道：
@Peter Toth I've looked into the details of this 
issue, and it appears that it's neither a regression in version 3.5.0 nor a 
correctness issue. It's a bug related to a new feature. I think we can fix this 
in 3.5.1 and list it as a known issue of the Scala client of Spark Connect in 
3.5.0.
Mridul Muralidharan mailto:mri...@gmail.com>> 于2023年9月10日周日 
04:12写道：

+1

Signatures, digests, etc check out fine.
Checked out tag and build/tested with -Phive -Pyarn -Pmesos -Pkubernetes

Regards,
Mridul

On Sat, Sep 9, 2023 at 10:02 AM Yuanjian Li 
mailto:xyliyuanj...@gmail.com>> wrote:

Please vote on releasing the following candidate(RC5) as Apache Spark version 
3.5.0.


The vote is open until 11:59pm Pacific time Sep 11th and passes if a majority 
+1 PMC votes are cast, with a minimum of 3 +1 votes.


[ ] +1 Release this package as Apache Spark 3.5.0

[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/


The tag to be voted on is v3.5.0-rc5 (commit 
ce5ddad990373636e94071e7cef2f31021add07b):

https://github.com/apache/spark/tree/v3.5.0-rc5


The release files, including signatures, digests, etc. can be found at:

https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-bin/


Signatures used for Spark RCs can be found in this file:

https://dist.apache.org/repos/dist/dev/spark/KEYS


The staging repository for this release can be found at:

https://repository.apache.org/content/repositories/orgapachespark-1449


The documentation corresponding to this release can be found at:

https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-docs/


The list of bug fixes going into 3.5.0 can be found at the following URL:

https://issues.apache.org/jira/projects/SPARK/versions/12352848


This release is using the release script of the tag v3.5.0-rc5.


FAQ


=

How can I help test this release?

=

If you are a Spark user, you can help us test this release by taking

an existing Spark workload and running on this release candidate, then

reporting any regressions.


If you're working in PySpark you can set up a virtual env and install

the current RC and see if anything important breaks, in the Java/Scala

you can add the staging repository to your projects resolvers and test

with the RC (make sure to clean up the artifact cache before/after so

you don't end up building with an out of date RC going forward).


===

What should happen to JIRA tickets still targeting 3.5.0?

===

The current list of open tickets targeted at 3.5.0 can be found at:

https://issues.apache.org/jira/projects/SPARK and search for "Target Version/s" 
= 3.5.0


Committers should look at those and triage. Extremely important bug

fixes, documentation, and API tweaks that impact compatibility should

be worked on immediately. Everything else please retarget to an

appropriate release.


==

But my bug isn't fixed?

==

In order to make timely releases, we will typically not hold the

release unless the bug in question is a regression from the previous

release. That being said, if there is something which is a regression

that has not been correctly targeted please ping me or a committer to

help target the issue.


Thanks,

Yuanjian Li

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-11 Thread Jia Fan

+1

Ruifeng Zheng  于2023年9月12日周二 08:46写道：

> +1
>
> On Tue, Sep 12, 2023 at 7:24 AM Hyukjin Kwon  wrote:
>
>> +1
>>
>> On Tue, Sep 12, 2023 at 7:05 AM Xiao Li  wrote:
>>
>>> +1
>>>
>>> Xiao
>>>
>>> Yuanjian Li  于2023年9月11日周一 10:53写道：
>>>
 @Peter Toth  I've looked into the details of
 this issue, and it appears that it's neither a regression in version 3.5.0
 nor a correctness issue. It's a bug related to a new feature. I think we
 can fix this in 3.5.1 and list it as a known issue of the Scala client of
 Spark Connect in 3.5.0.

 Mridul Muralidharan  于2023年9月10日周日 04:12写道：

>
> +1
>
> Signatures, digests, etc check out fine.
> Checked out tag and build/tested with -Phive -Pyarn -Pmesos
> -Pkubernetes
>
> Regards,
> Mridul
>
> On Sat, Sep 9, 2023 at 10:02 AM Yuanjian Li 
> wrote:
>
>> Please vote on releasing the following candidate(RC5) as Apache Spark
>> version 3.5.0.
>>
>> The vote is open until 11:59pm Pacific time Sep 11th and passes if a
>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.5.0
>>
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v3.5.0-rc5 (commit
>> ce5ddad990373636e94071e7cef2f31021add07b):
>>
>> https://github.com/apache/spark/tree/v3.5.0-rc5
>>
>> The release files, including signatures, digests, etc. can be found
>> at:
>>
>> https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>>
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>>
>> https://repository.apache.org/content/repositories/orgapachespark-1449
>>
>> The documentation corresponding to this release can be found at:
>>
>> https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-docs/
>>
>> The list of bug fixes going into 3.5.0 can be found at the following
>> URL:
>>
>> https://issues.apache.org/jira/projects/SPARK/versions/12352848
>>
>> This release is using the release script of the tag v3.5.0-rc5.
>>
>>
>> FAQ
>>
>> =
>>
>> How can I help test this release?
>>
>> =
>>
>> If you are a Spark user, you can help us test this release by taking
>>
>> an existing Spark workload and running on this release candidate, then
>>
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>>
>> the current RC and see if anything important breaks, in the Java/Scala
>>
>> you can add the staging repository to your projects resolvers and test
>>
>> with the RC (make sure to clean up the artifact cache before/after so
>>
>> you don't end up building with an out of date RC going forward).
>>
>> ===
>>
>> What should happen to JIRA tickets still targeting 3.5.0?
>>
>> ===
>>
>> The current list of open tickets targeted at 3.5.0 can be found at:
>>
>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 3.5.0
>>
>> Committers should look at those and triage. Extremely important bug
>>
>> fixes, documentation, and API tweaks that impact compatibility should
>>
>> be worked on immediately. Everything else please retarget to an
>>
>> appropriate release.
>>
>> ==
>>
>> But my bug isn't fixed?
>>
>> ==
>>
>> In order to make timely releases, we will typically not hold the
>>
>> release unless the bug in question is a regression from the previous
>>
>> release. That being said, if there is something which is a regression
>>
>> that has not been correctly targeted please ping me or a committer to
>>
>> help target the issue.
>>
>> Thanks,
>>
>> Yuanjian Li
>>
>

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-11 Thread Wenchen Fan

+1

On Tue, Sep 12, 2023 at 9:00 AM Yuanjian Li  wrote:

> +1 (non-binding)
>
> Yuanjian Li  于2023年9月11日周一 09:36写道：
>
>> @Peter Toth  I've looked into the details of this
>> issue, and it appears that it's neither a regression in version 3.5.0 nor a
>> correctness issue. It's a bug related to a new feature. I think we can fix
>> this in 3.5.1 and list it as a known issue of the Scala client of Spark
>> Connect in 3.5.0.
>>
>> Mridul Muralidharan  于2023年9月10日周日 04:12写道：
>>
>>>
>>> +1
>>>
>>> Signatures, digests, etc check out fine.
>>> Checked out tag and build/tested with -Phive -Pyarn -Pmesos -Pkubernetes
>>>
>>> Regards,
>>> Mridul
>>>
>>> On Sat, Sep 9, 2023 at 10:02 AM Yuanjian Li 
>>> wrote:
>>>
 Please vote on releasing the following candidate(RC5) as Apache Spark
 version 3.5.0.

 The vote is open until 11:59pm Pacific time Sep 11th and passes if a
 majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

 [ ] +1 Release this package as Apache Spark 3.5.0

 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see http://spark.apache.org/

 The tag to be voted on is v3.5.0-rc5 (commit
 ce5ddad990373636e94071e7cef2f31021add07b):

 https://github.com/apache/spark/tree/v3.5.0-rc5

 The release files, including signatures, digests, etc. can be found at:

 https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-bin/

 Signatures used for Spark RCs can be found in this file:

 https://dist.apache.org/repos/dist/dev/spark/KEYS

 The staging repository for this release can be found at:

 https://repository.apache.org/content/repositories/orgapachespark-1449

 The documentation corresponding to this release can be found at:

 https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-docs/

 The list of bug fixes going into 3.5.0 can be found at the following
 URL:

 https://issues.apache.org/jira/projects/SPARK/versions/12352848

 This release is using the release script of the tag v3.5.0-rc5.


 FAQ

 =

 How can I help test this release?

 =

 If you are a Spark user, you can help us test this release by taking

 an existing Spark workload and running on this release candidate, then

 reporting any regressions.

 If you're working in PySpark you can set up a virtual env and install

 the current RC and see if anything important breaks, in the Java/Scala

 you can add the staging repository to your projects resolvers and test

 with the RC (make sure to clean up the artifact cache before/after so

 you don't end up building with an out of date RC going forward).

 ===

 What should happen to JIRA tickets still targeting 3.5.0?

 ===

 The current list of open tickets targeted at 3.5.0 can be found at:

 https://issues.apache.org/jira/projects/SPARK and search for "Target
 Version/s" = 3.5.0

 Committers should look at those and triage. Extremely important bug

 fixes, documentation, and API tweaks that impact compatibility should

 be worked on immediately. Everything else please retarget to an

 appropriate release.

 ==

 But my bug isn't fixed?

 ==

 In order to make timely releases, we will typically not hold the

 release unless the bug in question is a regression from the previous

 release. That being said, if there is something which is a regression

 that has not been correctly targeted please ping me or a committer to

 help target the issue.

 Thanks,

 Yuanjian Li

>>>

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-11 Thread Jungtaek Lim

+1 (non-binding)

Thanks for driving this release and the patience on multiple RCs!

On Tue, Sep 12, 2023 at 10:00 AM Yuanjian Li  wrote:

> +1 (non-binding)
>
> Yuanjian Li  于2023年9月11日周一 09:36写道：
>
>> @Peter Toth  I've looked into the details of this
>> issue, and it appears that it's neither a regression in version 3.5.0 nor a
>> correctness issue. It's a bug related to a new feature. I think we can fix
>> this in 3.5.1 and list it as a known issue of the Scala client of Spark
>> Connect in 3.5.0.
>>
>> Mridul Muralidharan  于2023年9月10日周日 04:12写道：
>>
>>>
>>> +1
>>>
>>> Signatures, digests, etc check out fine.
>>> Checked out tag and build/tested with -Phive -Pyarn -Pmesos -Pkubernetes
>>>
>>> Regards,
>>> Mridul
>>>
>>> On Sat, Sep 9, 2023 at 10:02 AM Yuanjian Li 
>>> wrote:
>>>
 Please vote on releasing the following candidate(RC5) as Apache Spark
 version 3.5.0.

 The vote is open until 11:59pm Pacific time Sep 11th and passes if a
 majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

 [ ] +1 Release this package as Apache Spark 3.5.0

 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see http://spark.apache.org/

 The tag to be voted on is v3.5.0-rc5 (commit
 ce5ddad990373636e94071e7cef2f31021add07b):

 https://github.com/apache/spark/tree/v3.5.0-rc5

 The release files, including signatures, digests, etc. can be found at:

 https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-bin/

 Signatures used for Spark RCs can be found in this file:

 https://dist.apache.org/repos/dist/dev/spark/KEYS

 The staging repository for this release can be found at:

 https://repository.apache.org/content/repositories/orgapachespark-1449

 The documentation corresponding to this release can be found at:

 https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-docs/

 The list of bug fixes going into 3.5.0 can be found at the following
 URL:

 https://issues.apache.org/jira/projects/SPARK/versions/12352848

 This release is using the release script of the tag v3.5.0-rc5.


 FAQ

 =

 How can I help test this release?

 =

 If you are a Spark user, you can help us test this release by taking

 an existing Spark workload and running on this release candidate, then

 reporting any regressions.

 If you're working in PySpark you can set up a virtual env and install

 the current RC and see if anything important breaks, in the Java/Scala

 you can add the staging repository to your projects resolvers and test

 with the RC (make sure to clean up the artifact cache before/after so

 you don't end up building with an out of date RC going forward).

 ===

 What should happen to JIRA tickets still targeting 3.5.0?

 ===

 The current list of open tickets targeted at 3.5.0 can be found at:

 https://issues.apache.org/jira/projects/SPARK and search for "Target
 Version/s" = 3.5.0

 Committers should look at those and triage. Extremely important bug

 fixes, documentation, and API tweaks that impact compatibility should

 be worked on immediately. Everything else please retarget to an

 appropriate release.

 ==

 But my bug isn't fixed?

 ==

 In order to make timely releases, we will typically not hold the

 release unless the bug in question is a regression from the previous

 release. That being said, if there is something which is a regression

 that has not been correctly targeted please ping me or a committer to

 help target the issue.

 Thanks,

 Yuanjian Li

>>>

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-11 Thread Ruifeng Zheng

+1

On Tue, Sep 12, 2023 at 7:24 AM Hyukjin Kwon  wrote:

> +1
>
> On Tue, Sep 12, 2023 at 7:05 AM Xiao Li  wrote:
>
>> +1
>>
>> Xiao
>>
>> Yuanjian Li  于2023年9月11日周一 10:53写道：
>>
>>> @Peter Toth  I've looked into the details of this
>>> issue, and it appears that it's neither a regression in version 3.5.0 nor a
>>> correctness issue. It's a bug related to a new feature. I think we can fix
>>> this in 3.5.1 and list it as a known issue of the Scala client of Spark
>>> Connect in 3.5.0.
>>>
>>> Mridul Muralidharan  于2023年9月10日周日 04:12写道：
>>>

 +1

 Signatures, digests, etc check out fine.
 Checked out tag and build/tested with -Phive -Pyarn -Pmesos -Pkubernetes

 Regards,
 Mridul

 On Sat, Sep 9, 2023 at 10:02 AM Yuanjian Li 
 wrote:

> Please vote on releasing the following candidate(RC5) as Apache Spark
> version 3.5.0.
>
> The vote is open until 11:59pm Pacific time Sep 11th and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.5.0
>
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.5.0-rc5 (commit
> ce5ddad990373636e94071e7cef2f31021add07b):
>
> https://github.com/apache/spark/tree/v3.5.0-rc5
>
> The release files, including signatures, digests, etc. can be found at:
>
> https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-bin/
>
> Signatures used for Spark RCs can be found in this file:
>
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
>
> https://repository.apache.org/content/repositories/orgapachespark-1449
>
> The documentation corresponding to this release can be found at:
>
> https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-docs/
>
> The list of bug fixes going into 3.5.0 can be found at the following
> URL:
>
> https://issues.apache.org/jira/projects/SPARK/versions/12352848
>
> This release is using the release script of the tag v3.5.0-rc5.
>
>
> FAQ
>
> =
>
> How can I help test this release?
>
> =
>
> If you are a Spark user, you can help us test this release by taking
>
> an existing Spark workload and running on this release candidate, then
>
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
>
> the current RC and see if anything important breaks, in the Java/Scala
>
> you can add the staging repository to your projects resolvers and test
>
> with the RC (make sure to clean up the artifact cache before/after so
>
> you don't end up building with an out of date RC going forward).
>
> ===
>
> What should happen to JIRA tickets still targeting 3.5.0?
>
> ===
>
> The current list of open tickets targeted at 3.5.0 can be found at:
>
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.5.0
>
> Committers should look at those and triage. Extremely important bug
>
> fixes, documentation, and API tweaks that impact compatibility should
>
> be worked on immediately. Everything else please retarget to an
>
> appropriate release.
>
> ==
>
> But my bug isn't fixed?
>
> ==
>
> In order to make timely releases, we will typically not hold the
>
> release unless the bug in question is a regression from the previous
>
> release. That being said, if there is something which is a regression
>
> that has not been correctly targeted please ping me or a committer to
>
> help target the issue.
>
> Thanks,
>
> Yuanjian Li
>

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-11 Thread Hyukjin Kwon

+1

On Tue, Sep 12, 2023 at 7:05 AM Xiao Li  wrote:

> +1
>
> Xiao
>
> Yuanjian Li  于2023年9月11日周一 10:53写道：
>
>> @Peter Toth  I've looked into the details of this
>> issue, and it appears that it's neither a regression in version 3.5.0 nor a
>> correctness issue. It's a bug related to a new feature. I think we can fix
>> this in 3.5.1 and list it as a known issue of the Scala client of Spark
>> Connect in 3.5.0.
>>
>> Mridul Muralidharan  于2023年9月10日周日 04:12写道：
>>
>>>
>>> +1
>>>
>>> Signatures, digests, etc check out fine.
>>> Checked out tag and build/tested with -Phive -Pyarn -Pmesos -Pkubernetes
>>>
>>> Regards,
>>> Mridul
>>>
>>> On Sat, Sep 9, 2023 at 10:02 AM Yuanjian Li 
>>> wrote:
>>>
 Please vote on releasing the following candidate(RC5) as Apache Spark
 version 3.5.0.

 The vote is open until 11:59pm Pacific time Sep 11th and passes if a
 majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

 [ ] +1 Release this package as Apache Spark 3.5.0

 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see http://spark.apache.org/

 The tag to be voted on is v3.5.0-rc5 (commit
 ce5ddad990373636e94071e7cef2f31021add07b):

 https://github.com/apache/spark/tree/v3.5.0-rc5

 The release files, including signatures, digests, etc. can be found at:

 https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-bin/

 Signatures used for Spark RCs can be found in this file:

 https://dist.apache.org/repos/dist/dev/spark/KEYS

 The staging repository for this release can be found at:

 https://repository.apache.org/content/repositories/orgapachespark-1449

 The documentation corresponding to this release can be found at:

 https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-docs/

 The list of bug fixes going into 3.5.0 can be found at the following
 URL:

 https://issues.apache.org/jira/projects/SPARK/versions/12352848

 This release is using the release script of the tag v3.5.0-rc5.


 FAQ

 =

 How can I help test this release?

 =

 If you are a Spark user, you can help us test this release by taking

 an existing Spark workload and running on this release candidate, then

 reporting any regressions.

 If you're working in PySpark you can set up a virtual env and install

 the current RC and see if anything important breaks, in the Java/Scala

 you can add the staging repository to your projects resolvers and test

 with the RC (make sure to clean up the artifact cache before/after so

 you don't end up building with an out of date RC going forward).

 ===

 What should happen to JIRA tickets still targeting 3.5.0?

 ===

 The current list of open tickets targeted at 3.5.0 can be found at:

 https://issues.apache.org/jira/projects/SPARK and search for "Target
 Version/s" = 3.5.0

 Committers should look at those and triage. Extremely important bug

 fixes, documentation, and API tweaks that impact compatibility should

 be worked on immediately. Everything else please retarget to an

 appropriate release.

 ==

 But my bug isn't fixed?

 ==

 In order to make timely releases, we will typically not hold the

 release unless the bug in question is a regression from the previous

 release. That being said, if there is something which is a regression

 that has not been correctly targeted please ping me or a committer to

 help target the issue.

 Thanks,

 Yuanjian Li

>>>

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-11 Thread Gengliang Wang

+1

On Mon, Sep 11, 2023 at 11:28 AM Xiao Li  wrote:

> +1
>
> Xiao
>
> Yuanjian Li  于2023年9月11日周一 10:53写道：
>
>> @Peter Toth  I've looked into the details of this
>> issue, and it appears that it's neither a regression in version 3.5.0 nor a
>> correctness issue. It's a bug related to a new feature. I think we can fix
>> this in 3.5.1 and list it as a known issue of the Scala client of Spark
>> Connect in 3.5.0.
>>
>> Mridul Muralidharan  于2023年9月10日周日 04:12写道：
>>
>>>
>>> +1
>>>
>>> Signatures, digests, etc check out fine.
>>> Checked out tag and build/tested with -Phive -Pyarn -Pmesos -Pkubernetes
>>>
>>> Regards,
>>> Mridul
>>>
>>> On Sat, Sep 9, 2023 at 10:02 AM Yuanjian Li 
>>> wrote:
>>>
 Please vote on releasing the following candidate(RC5) as Apache Spark
 version 3.5.0.

 The vote is open until 11:59pm Pacific time Sep 11th and passes if a
 majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

 [ ] +1 Release this package as Apache Spark 3.5.0

 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see http://spark.apache.org/

 The tag to be voted on is v3.5.0-rc5 (commit
 ce5ddad990373636e94071e7cef2f31021add07b):

 https://github.com/apache/spark/tree/v3.5.0-rc5

 The release files, including signatures, digests, etc. can be found at:

 https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-bin/

 Signatures used for Spark RCs can be found in this file:

 https://dist.apache.org/repos/dist/dev/spark/KEYS

 The staging repository for this release can be found at:

 https://repository.apache.org/content/repositories/orgapachespark-1449

 The documentation corresponding to this release can be found at:

 https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-docs/

 The list of bug fixes going into 3.5.0 can be found at the following
 URL:

 https://issues.apache.org/jira/projects/SPARK/versions/12352848

 This release is using the release script of the tag v3.5.0-rc5.


 FAQ

 =

 How can I help test this release?

 =

 If you are a Spark user, you can help us test this release by taking

 an existing Spark workload and running on this release candidate, then

 reporting any regressions.

 If you're working in PySpark you can set up a virtual env and install

 the current RC and see if anything important breaks, in the Java/Scala

 you can add the staging repository to your projects resolvers and test

 with the RC (make sure to clean up the artifact cache before/after so

 you don't end up building with an out of date RC going forward).

 ===

 What should happen to JIRA tickets still targeting 3.5.0?

 ===

 The current list of open tickets targeted at 3.5.0 can be found at:

 https://issues.apache.org/jira/projects/SPARK and search for "Target
 Version/s" = 3.5.0

 Committers should look at those and triage. Extremely important bug

 fixes, documentation, and API tweaks that impact compatibility should

 be worked on immediately. Everything else please retarget to an

 appropriate release.

 ==

 But my bug isn't fixed?

 ==

 In order to make timely releases, we will typically not hold the

 release unless the bug in question is a regression from the previous

 release. That being said, if there is something which is a regression

 that has not been correctly targeted please ping me or a committer to

 help target the issue.

 Thanks,

 Yuanjian Li

>>>

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-11 Thread Xiao Li

+1

Xiao

Yuanjian Li  于2023年9月11日周一 10:53写道：

> @Peter Toth  I've looked into the details of this
> issue, and it appears that it's neither a regression in version 3.5.0 nor a
> correctness issue. It's a bug related to a new feature. I think we can fix
> this in 3.5.1 and list it as a known issue of the Scala client of Spark
> Connect in 3.5.0.
>
> Mridul Muralidharan  于2023年9月10日周日 04:12写道：
>
>>
>> +1
>>
>> Signatures, digests, etc check out fine.
>> Checked out tag and build/tested with -Phive -Pyarn -Pmesos -Pkubernetes
>>
>> Regards,
>> Mridul
>>
>> On Sat, Sep 9, 2023 at 10:02 AM Yuanjian Li 
>> wrote:
>>
>>> Please vote on releasing the following candidate(RC5) as Apache Spark
>>> version 3.5.0.
>>>
>>> The vote is open until 11:59pm Pacific time Sep 11th and passes if a
>>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 3.5.0
>>>
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is v3.5.0-rc5 (commit
>>> ce5ddad990373636e94071e7cef2f31021add07b):
>>>
>>> https://github.com/apache/spark/tree/v3.5.0-rc5
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>>
>>> https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-bin/
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>>
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>>
>>> https://repository.apache.org/content/repositories/orgapachespark-1449
>>>
>>> The documentation corresponding to this release can be found at:
>>>
>>> https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-docs/
>>>
>>> The list of bug fixes going into 3.5.0 can be found at the following URL:
>>>
>>> https://issues.apache.org/jira/projects/SPARK/versions/12352848
>>>
>>> This release is using the release script of the tag v3.5.0-rc5.
>>>
>>>
>>> FAQ
>>>
>>> =
>>>
>>> How can I help test this release?
>>>
>>> =
>>>
>>> If you are a Spark user, you can help us test this release by taking
>>>
>>> an existing Spark workload and running on this release candidate, then
>>>
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and install
>>>
>>> the current RC and see if anything important breaks, in the Java/Scala
>>>
>>> you can add the staging repository to your projects resolvers and test
>>>
>>> with the RC (make sure to clean up the artifact cache before/after so
>>>
>>> you don't end up building with an out of date RC going forward).
>>>
>>> ===
>>>
>>> What should happen to JIRA tickets still targeting 3.5.0?
>>>
>>> ===
>>>
>>> The current list of open tickets targeted at 3.5.0 can be found at:
>>>
>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>> Version/s" = 3.5.0
>>>
>>> Committers should look at those and triage. Extremely important bug
>>>
>>> fixes, documentation, and API tweaks that impact compatibility should
>>>
>>> be worked on immediately. Everything else please retarget to an
>>>
>>> appropriate release.
>>>
>>> ==
>>>
>>> But my bug isn't fixed?
>>>
>>> ==
>>>
>>> In order to make timely releases, we will typically not hold the
>>>
>>> release unless the bug in question is a regression from the previous
>>>
>>> release. That being said, if there is something which is a regression
>>>
>>> that has not been correctly targeted please ping me or a committer to
>>>
>>> help target the issue.
>>>
>>> Thanks,
>>>
>>> Yuanjian Li
>>>
>>

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-11 Thread Peter Toth

Thanks Yuanjian. Please disregard my -1 then.

Yuanjian Li  ezt írta (időpont: 2023. szept. 11.,
H, 18:36):

> @Peter Toth  I've looked into the details of this
> issue, and it appears that it's neither a regression in version 3.5.0 nor a
> correctness issue. It's a bug related to a new feature. I think we can fix
> this in 3.5.1 and list it as a known issue of the Scala client of Spark
> Connect in 3.5.0.
>
> Mridul Muralidharan  于2023年9月10日周日 04:12写道：
>
>>
>> +1
>>
>> Signatures, digests, etc check out fine.
>> Checked out tag and build/tested with -Phive -Pyarn -Pmesos -Pkubernetes
>>
>> Regards,
>> Mridul
>>
>> On Sat, Sep 9, 2023 at 10:02 AM Yuanjian Li 
>> wrote:
>>
>>> Please vote on releasing the following candidate(RC5) as Apache Spark
>>> version 3.5.0.
>>>
>>> The vote is open until 11:59pm Pacific time Sep 11th and passes if a
>>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 3.5.0
>>>
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is v3.5.0-rc5 (commit
>>> ce5ddad990373636e94071e7cef2f31021add07b):
>>>
>>> https://github.com/apache/spark/tree/v3.5.0-rc5
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>>
>>> https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-bin/
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>>
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>>
>>> https://repository.apache.org/content/repositories/orgapachespark-1449
>>>
>>> The documentation corresponding to this release can be found at:
>>>
>>> https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-docs/
>>>
>>> The list of bug fixes going into 3.5.0 can be found at the following URL:
>>>
>>> https://issues.apache.org/jira/projects/SPARK/versions/12352848
>>>
>>> This release is using the release script of the tag v3.5.0-rc5.
>>>
>>>
>>> FAQ
>>>
>>> =
>>>
>>> How can I help test this release?
>>>
>>> =
>>>
>>> If you are a Spark user, you can help us test this release by taking
>>>
>>> an existing Spark workload and running on this release candidate, then
>>>
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and install
>>>
>>> the current RC and see if anything important breaks, in the Java/Scala
>>>
>>> you can add the staging repository to your projects resolvers and test
>>>
>>> with the RC (make sure to clean up the artifact cache before/after so
>>>
>>> you don't end up building with an out of date RC going forward).
>>>
>>> ===
>>>
>>> What should happen to JIRA tickets still targeting 3.5.0?
>>>
>>> ===
>>>
>>> The current list of open tickets targeted at 3.5.0 can be found at:
>>>
>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>> Version/s" = 3.5.0
>>>
>>> Committers should look at those and triage. Extremely important bug
>>>
>>> fixes, documentation, and API tweaks that impact compatibility should
>>>
>>> be worked on immediately. Everything else please retarget to an
>>>
>>> appropriate release.
>>>
>>> ==
>>>
>>> But my bug isn't fixed?
>>>
>>> ==
>>>
>>> In order to make timely releases, we will typically not hold the
>>>
>>> release unless the bug in question is a regression from the previous
>>>
>>> release. That being said, if there is something which is a regression
>>>
>>> that has not been correctly targeted please ping me or a committer to
>>>
>>> help target the issue.
>>>
>>> Thanks,
>>>
>>> Yuanjian Li
>>>
>>

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-11 Thread Yuanjian Li

+1 (non-binding)

Yuanjian Li  于2023年9月11日周一 09:36写道：

> @Peter Toth  I've looked into the details of this
> issue, and it appears that it's neither a regression in version 3.5.0 nor a
> correctness issue. It's a bug related to a new feature. I think we can fix
> this in 3.5.1 and list it as a known issue of the Scala client of Spark
> Connect in 3.5.0.
>
> Mridul Muralidharan  于2023年9月10日周日 04:12写道：
>
>>
>> +1
>>
>> Signatures, digests, etc check out fine.
>> Checked out tag and build/tested with -Phive -Pyarn -Pmesos -Pkubernetes
>>
>> Regards,
>> Mridul
>>
>> On Sat, Sep 9, 2023 at 10:02 AM Yuanjian Li 
>> wrote:
>>
>>> Please vote on releasing the following candidate(RC5) as Apache Spark
>>> version 3.5.0.
>>>
>>> The vote is open until 11:59pm Pacific time Sep 11th and passes if a
>>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 3.5.0
>>>
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is v3.5.0-rc5 (commit
>>> ce5ddad990373636e94071e7cef2f31021add07b):
>>>
>>> https://github.com/apache/spark/tree/v3.5.0-rc5
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>>
>>> https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-bin/
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>>
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>>
>>> https://repository.apache.org/content/repositories/orgapachespark-1449
>>>
>>> The documentation corresponding to this release can be found at:
>>>
>>> https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-docs/
>>>
>>> The list of bug fixes going into 3.5.0 can be found at the following URL:
>>>
>>> https://issues.apache.org/jira/projects/SPARK/versions/12352848
>>>
>>> This release is using the release script of the tag v3.5.0-rc5.
>>>
>>>
>>> FAQ
>>>
>>> =
>>>
>>> How can I help test this release?
>>>
>>> =
>>>
>>> If you are a Spark user, you can help us test this release by taking
>>>
>>> an existing Spark workload and running on this release candidate, then
>>>
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and install
>>>
>>> the current RC and see if anything important breaks, in the Java/Scala
>>>
>>> you can add the staging repository to your projects resolvers and test
>>>
>>> with the RC (make sure to clean up the artifact cache before/after so
>>>
>>> you don't end up building with an out of date RC going forward).
>>>
>>> ===
>>>
>>> What should happen to JIRA tickets still targeting 3.5.0?
>>>
>>> ===
>>>
>>> The current list of open tickets targeted at 3.5.0 can be found at:
>>>
>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>> Version/s" = 3.5.0
>>>
>>> Committers should look at those and triage. Extremely important bug
>>>
>>> fixes, documentation, and API tweaks that impact compatibility should
>>>
>>> be worked on immediately. Everything else please retarget to an
>>>
>>> appropriate release.
>>>
>>> ==
>>>
>>> But my bug isn't fixed?
>>>
>>> ==
>>>
>>> In order to make timely releases, we will typically not hold the
>>>
>>> release unless the bug in question is a regression from the previous
>>>
>>> release. That being said, if there is something which is a regression
>>>
>>> that has not been correctly targeted please ping me or a committer to
>>>
>>> help target the issue.
>>>
>>> Thanks,
>>>
>>> Yuanjian Li
>>>
>>

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-11 Thread Yuanjian Li

@Peter Toth  I've looked into the details of this
issue, and it appears that it's neither a regression in version 3.5.0 nor a
correctness issue. It's a bug related to a new feature. I think we can fix
this in 3.5.1 and list it as a known issue of the Scala client of Spark
Connect in 3.5.0.

Mridul Muralidharan  于2023年9月10日周日 04:12写道：

>
> +1
>
> Signatures, digests, etc check out fine.
> Checked out tag and build/tested with -Phive -Pyarn -Pmesos -Pkubernetes
>
> Regards,
> Mridul
>
> On Sat, Sep 9, 2023 at 10:02 AM Yuanjian Li 
> wrote:
>
>> Please vote on releasing the following candidate(RC5) as Apache Spark
>> version 3.5.0.
>>
>> The vote is open until 11:59pm Pacific time Sep 11th and passes if a
>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.5.0
>>
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v3.5.0-rc5 (commit
>> ce5ddad990373636e94071e7cef2f31021add07b):
>>
>> https://github.com/apache/spark/tree/v3.5.0-rc5
>>
>> The release files, including signatures, digests, etc. can be found at:
>>
>> https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>>
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>>
>> https://repository.apache.org/content/repositories/orgapachespark-1449
>>
>> The documentation corresponding to this release can be found at:
>>
>> https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-docs/
>>
>> The list of bug fixes going into 3.5.0 can be found at the following URL:
>>
>> https://issues.apache.org/jira/projects/SPARK/versions/12352848
>>
>> This release is using the release script of the tag v3.5.0-rc5.
>>
>>
>> FAQ
>>
>> =
>>
>> How can I help test this release?
>>
>> =
>>
>> If you are a Spark user, you can help us test this release by taking
>>
>> an existing Spark workload and running on this release candidate, then
>>
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>>
>> the current RC and see if anything important breaks, in the Java/Scala
>>
>> you can add the staging repository to your projects resolvers and test
>>
>> with the RC (make sure to clean up the artifact cache before/after so
>>
>> you don't end up building with an out of date RC going forward).
>>
>> ===
>>
>> What should happen to JIRA tickets still targeting 3.5.0?
>>
>> ===
>>
>> The current list of open tickets targeted at 3.5.0 can be found at:
>>
>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 3.5.0
>>
>> Committers should look at those and triage. Extremely important bug
>>
>> fixes, documentation, and API tweaks that impact compatibility should
>>
>> be worked on immediately. Everything else please retarget to an
>>
>> appropriate release.
>>
>> ==
>>
>> But my bug isn't fixed?
>>
>> ==
>>
>> In order to make timely releases, we will typically not hold the
>>
>> release unless the bug in question is a regression from the previous
>>
>> release. That being said, if there is something which is a regression
>>
>> that has not been correctly targeted please ping me or a committer to
>>
>> help target the issue.
>>
>> Thanks,
>>
>> Yuanjian Li
>>
>

unsubscribe

2023-09-11 Thread Sairam Natarajan

unsubscribe

unsubscribe

2023-09-10 Thread Cenk Ariöz

unsubscribe

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-10 Thread Mridul Muralidharan

+1

Signatures, digests, etc check out fine.
Checked out tag and build/tested with -Phive -Pyarn -Pmesos -Pkubernetes

Regards,
Mridul

On Sat, Sep 9, 2023 at 10:02 AM Yuanjian Li  wrote:

> Please vote on releasing the following candidate(RC5) as Apache Spark
> version 3.5.0.
>
> The vote is open until 11:59pm Pacific time Sep 11th and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.5.0
>
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.5.0-rc5 (commit
> ce5ddad990373636e94071e7cef2f31021add07b):
>
> https://github.com/apache/spark/tree/v3.5.0-rc5
>
> The release files, including signatures, digests, etc. can be found at:
>
> https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-bin/
>
> Signatures used for Spark RCs can be found in this file:
>
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
>
> https://repository.apache.org/content/repositories/orgapachespark-1449
>
> The documentation corresponding to this release can be found at:
>
> https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-docs/
>
> The list of bug fixes going into 3.5.0 can be found at the following URL:
>
> https://issues.apache.org/jira/projects/SPARK/versions/12352848
>
> This release is using the release script of the tag v3.5.0-rc5.
>
>
> FAQ
>
> =
>
> How can I help test this release?
>
> =
>
> If you are a Spark user, you can help us test this release by taking
>
> an existing Spark workload and running on this release candidate, then
>
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
>
> the current RC and see if anything important breaks, in the Java/Scala
>
> you can add the staging repository to your projects resolvers and test
>
> with the RC (make sure to clean up the artifact cache before/after so
>
> you don't end up building with an out of date RC going forward).
>
> ===
>
> What should happen to JIRA tickets still targeting 3.5.0?
>
> ===
>
> The current list of open tickets targeted at 3.5.0 can be found at:
>
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.5.0
>
> Committers should look at those and triage. Extremely important bug
>
> fixes, documentation, and API tweaks that impact compatibility should
>
> be worked on immediately. Everything else please retarget to an
>
> appropriate release.
>
> ==
>
> But my bug isn't fixed?
>
> ==
>
> In order to make timely releases, we will typically not hold the
>
> release unless the bug in question is a regression from the previous
>
> release. That being said, if there is something which is a regression
>
> that has not been correctly targeted please ping me or a committer to
>
> help target the issue.
>
> Thanks,
>
> Yuanjian Li
>

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-10 Thread Peter Toth

Hi Yuanjian,

Sorry, -1 from me. Let's not introduce this bugs in 3.5:
https://issues.apache.org/jira/browse/SPARK-45109 /
https://github.com/apache/spark/pull/42863

Best,
Peter

Yuanjian Li  ezt írta (időpont: 2023. szept. 10.,
V, 10:39):

> Yes, SPARK-44805 has been included. For the commits from RC4 to RC5,
> please refer to https://github.com/apache/spark/commits/v3.5.0-rc5.
>
> Mich Talebzadeh  于2023年9月9日周六 08:09写道：
>
>> Apologies that should read ... release 3.5.0 (RC4) plus ..
>>
>> Mich Talebzadeh,
>> Distinguished Technologist, Solutions Architect & Engineer
>> London
>> United Kingdom
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Sat, 9 Sept 2023 at 15:58, Mich Talebzadeh 
>> wrote:
>>
>>> Hi,
>>>
>>> Can you please confirm that this cut is release 3.4.0 plus the resolved
>>> Jira  https://issues.apache.org/jira/browse/SPARK-44805 which was
>>> already fixed yesterday?
>>>
>>> Nothing else I believe?
>>>
>>> Thanks
>>>
>>> Mich
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Sat, 9 Sept 2023 at 15:42, Yuanjian Li 
>>> wrote:
>>>
 Please vote on releasing the following candidate(RC5) as Apache Spark
 version 3.5.0.

 The vote is open until 11:59pm Pacific time Sep 11th and passes if a
 majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

 [ ] +1 Release this package as Apache Spark 3.5.0

 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see http://spark.apache.org/

 The tag to be voted on is v3.5.0-rc5 (commit
 ce5ddad990373636e94071e7cef2f31021add07b):

 https://github.com/apache/spark/tree/v3.5.0-rc5

 The release files, including signatures, digests, etc. can be found at:

 https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-bin/

 Signatures used for Spark RCs can be found in this file:

 https://dist.apache.org/repos/dist/dev/spark/KEYS

 The staging repository for this release can be found at:

 https://repository.apache.org/content/repositories/orgapachespark-1449

 The documentation corresponding to this release can be found at:

 https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-docs/

 The list of bug fixes going into 3.5.0 can be found at the following
 URL:

 https://issues.apache.org/jira/projects/SPARK/versions/12352848

 This release is using the release script of the tag v3.5.0-rc5.


 FAQ

 =

 How can I help test this release?

 =

 If you are a Spark user, you can help us test this release by taking

 an existing Spark workload and running on this release candidate, then

 reporting any regressions.

 If you're working in PySpark you can set up a virtual env and install

 the current RC and see if anything important breaks, in the Java/Scala

 you can add the staging repository to your projects resolvers and test

 with the RC (make sure to clean up the artifact cache before/after so

 you don't end up building with an out of date RC going forward).

 ===

 What should happen to JIRA tickets still targeting 3.5.0?

 ===

 The current list of open tickets targeted at 3.5.0 can be found at:

 https://issues.apache.org/jira/projects/SPARK and search for "Target
 Version/s" = 3.5.0

 Committers should look at those and triage. Extremely important bug

 fixes, documentation, and API tweaks that impact compatibility should

 be worked on immediately. Everything else please retarget to an

 appropriate release.

 ==

 But my bug isn't fixed?

 ==

 In order to make timely releases, we will typically not hold the

 release unless the bug in

< 5 6 7 8 9 10 11 12 13 14 >

901 - 1000 of 28346 matches

Mail list logo