Re: Please take a look at the draft of the Spark 3.1.1 release notes

2021-03-01 Thread Hyukjin Kwon
Oh yeah. I plan to fix it for the final release. Thanks for pointing that
out.

On Tue, 2 Mar 2021, 12:14 Kazuaki Ishizaki,  wrote:

> Hi Hyukjin,
> Thanks for your effort.
>
> One question: Do you automatically update the URLs to Spark documents in
> "the change of Behavior section" ? Currently, they refer to "
> https://spark.apache.org/docs/3.0.0/.. .".
> I think that they should refer to "https://spark.apache.org/docs/3.1.1/..
> ."
>
> Regards,
> Kazuaki Ishizaki,
>
>
>
> From:Hyukjin Kwon 
> To:dev 
> Cc:Dongjoon Hyun , Jungtaek Lim <
> kabhwan.opensou...@gmail.com>, Tom Graves 
> Date:2021/03/02 11:20
> Subject:Re: Please take a look at the draft of the Spark 3.1.1
> release notes
> --
>
>
>
> Thanks guys for suggestions and fixes. Now I feel pretty confident about
> the release notes :-). I will start uploading and preparing to announce
> Spark 3.1.1. 2021년 3월 2일 (화) 오전 7:29, Tom Graves 님이
> 작성: ‍‍
> Thanks guys for suggestions and fixes. Now I feel pretty confident about
> the release notes :-).
> I will start uploading and preparing to announce Spark 3.1.1.
>
> 2021년 3월 2일 (화) 오전 7:29, Tom Graves <*tgraves...@yahoo.com*
> >님이 작성:
> Thanks Hyukjin, overall they look good to me.
>
> Tom
> On Saturday, February 27, 2021, 05:00:42 PM CST, Jungtaek Lim <
> *kabhwan.opensou...@gmail.com* > wrote:
>
>
> Thanks Hyukjin! I've only looked into the SS part, and added a comment.
> Otherwise it looks great!
>
> On Sat, Feb 27, 2021 at 7:12 PM Dongjoon Hyun <*dongjoon.h...@gmail.com*
> > wrote:
> Thank you for sharing, Hyukjin!
>
> Dongjoon.
>
> On Sat, Feb 27, 2021 at 12:36 AM Hyukjin Kwon <*gurwls...@gmail.com*
> > wrote:
> Hi all,
>
> I am preparing to publish and announce Spark 3.1.1.
> This is the draft of the release note, and I plan to edit a bit more and
> use it as the final release note.
> Please take a look and let me know if I missed any major changes or
> something.
>
>
> *https://docs.google.com/document/d/1x6zzgRsZ4u1DgUh1XpGzX914CZbsHeRYpbqZ-PV6wdQ/edit?usp=sharing*
> 
>
> Thanks.
>
>


Re: Please take a look at the draft of the Spark 3.1.1 release notes

2021-03-01 Thread Kazuaki Ishizaki
Hi Hyukjin,
Thanks for your effort.

One question: Do you automatically update the URLs to Spark documents in 
"the change of Behavior section" ? Currently, they refer to "
https://spark.apache.org/docs/3.0.0/...;. I think that they should refer 
to "https://spark.apache.org/docs/3.1.1/...;

Regards,
Kazuaki Ishizaki, 



From:   Hyukjin Kwon 
To: dev 
Cc: Dongjoon Hyun , Jungtaek Lim 
, Tom Graves 
Date:   2021/03/02 11:20
Subject:Re: Please take a look at the draft of the Spark 3.1.1 
release notes



Thanks guys for suggestions and fixes. Now I feel pretty confident about 
the release notes :-). I will start uploading and preparing to announce 
Spark 3.1.1. 2021년 3월 2일 (화) 오전 7:29, Tom Graves 
님이 작성: ‍‍
Thanks guys for suggestions and fixes. Now I feel pretty confident about 
the release notes :-).
I will start uploading and preparing to announce Spark 3.1.1.

2021년 3월 2일 (화) 오전 7:29, Tom Graves 님이 작성:
Thanks Hyukjin, overall they look good to me.

Tom
On Saturday, February 27, 2021, 05:00:42 PM CST, Jungtaek Lim <
kabhwan.opensou...@gmail.com> wrote: 


Thanks Hyukjin! I've only looked into the SS part, and added a comment. 
Otherwise it looks great! 

On Sat, Feb 27, 2021 at 7:12 PM Dongjoon Hyun  
wrote:
Thank you for sharing, Hyukjin!

Dongjoon.

On Sat, Feb 27, 2021 at 12:36 AM Hyukjin Kwon  wrote:
Hi all,

I am preparing to publish and announce Spark 3.1.1.
This is the draft of the release note, and I plan to edit a bit more and 
use it as the final release note.
Please take a look and let me know if I missed any major changes or 
something.

https://docs.google.com/document/d/1x6zzgRsZ4u1DgUh1XpGzX914CZbsHeRYpbqZ-PV6wdQ/edit?usp=sharing

Thanks.




Re: Please take a look at the draft of the Spark 3.1.1 release notes

2021-03-01 Thread Hyukjin Kwon
Thanks guys for suggestions and fixes. Now I feel pretty confident about
the release notes :-).
I will start uploading and preparing to announce Spark 3.1.1.

2021년 3월 2일 (화) 오전 7:29, Tom Graves 님이 작성:

> Thanks Hyukjin, overall they look good to me.
>
> Tom
> On Saturday, February 27, 2021, 05:00:42 PM CST, Jungtaek Lim <
> kabhwan.opensou...@gmail.com> wrote:
>
>
> Thanks Hyukjin! I've only looked into the SS part, and added a comment.
> Otherwise it looks great!
>
> On Sat, Feb 27, 2021 at 7:12 PM Dongjoon Hyun 
> wrote:
>
> Thank you for sharing, Hyukjin!
>
> Dongjoon.
>
> On Sat, Feb 27, 2021 at 12:36 AM Hyukjin Kwon  wrote:
>
> Hi all,
>
> I am preparing to publish and announce Spark 3.1.1.
> This is the draft of the release note, and I plan to edit a bit more and
> use it as the final release note.
> Please take a look and let me know if I missed any major changes or
> something.
>
>
> https://docs.google.com/document/d/1x6zzgRsZ4u1DgUh1XpGzX914CZbsHeRYpbqZ-PV6wdQ/edit?usp=sharing
>
> Thanks.
>
>


Re: Please take a look at the draft of the Spark 3.1.1 release notes

2021-03-01 Thread Tom Graves
 Thanks Hyukjin, overall they look good to me.
TomOn Saturday, February 27, 2021, 05:00:42 PM CST, Jungtaek Lim 
 wrote:  
 
 Thanks Hyukjin! I've only looked into the SS part, and added a comment. 
Otherwise it looks great! 
On Sat, Feb 27, 2021 at 7:12 PM Dongjoon Hyun  wrote:

Thank you for sharing, Hyukjin!
Dongjoon.
On Sat, Feb 27, 2021 at 12:36 AM Hyukjin Kwon  wrote:

Hi all,

I am preparing to publish and announce Spark 3.1.1.
This is the draft of the release note, and I plan to edit a bit more and use it 
as the final release note.
Please take a look and let me know if I missed any major changes or something.

https://docs.google.com/document/d/1x6zzgRsZ4u1DgUh1XpGzX914CZbsHeRYpbqZ-PV6wdQ/edit?usp=sharing

Thanks.

  

Re: [DISCUSS] SPIP: FunctionCatalog

2021-03-01 Thread Ryan Blue
Thanks for adding your perspective, Erik!

If the input is string type but the UDF implementation calls row.getLong(0),
it returns wrong data

I think this is misleading. It is true for UnsafeRow, but there is no
reason why InternalRow should return incorrect values.

The implementation in GenericInternalRow

would throw a ClassCastException. I don’t think that using a row is a bad
option simply because UnsafeRow is unsafe.

It’s unlikely that UnsafeRow would be used to pass the data. The
implementation would evaluate each argument expression and set the result
in a generic row, then pass that row to the UDF. We can use whatever
implementation we choose to provide better guarantees than unsafe.

I think we should consider query-compile-time checks as nearly-as-good as
Java-compile-time checks for the purposes of safety.

I don’t think I agree with this. A failure at query analysis time vs
runtime still requires going back to a separate project, fixing something,
and rebuilding. The time needed to fix a problem goes up significantly vs.
compile-time checks. And that is even worse if the UDF is maintained by
someone else.

I think we also need to consider how common it would be that a use case can
have the query-compile-time checks. Going through this in more detail below
makes me think that it is unlikely that these checks would be used often
because of the limitations of using an interface with type erasure.

I believe that Wenchen’s proposal will provide stronger query-compile-time
safety

The proposal could have better safety for each argument, assuming that we
detect failures by looking at the parameter types using reflection in the
analyzer. But we don’t do that for any of the similar UDFs today so I’m
skeptical that this would actually be a high enough priority to implement.

As Erik pointed out, type erasure also limits the effectiveness. You can’t
implement ScalarFunction2 and ScalarFunction2.
You can handle those cases using InternalRow or you can handle them using
VarargScalarFunction. That forces many use cases into varargs with
Object, where you don’t get any of the proposed analyzer benefits and lose
compile-time checks. The only time the additional checks (if implemented)
would help is when only one set of argument types is needed because
implementing ScalarFunction defeats the purpose.

It’s worth noting that safety for the magic methods would be identical
between the two options, so the trade-off to consider is for varargs and
non-codegen cases. Combining the limitations discussed, this has better
safety guarantees only if you need just one set of types for each number of
arguments and are using the non-codegen path. Since varargs is one of the
primary reasons to use this API, then I don’t think that it is a good idea
to use Object[] instead of InternalRow.
-- 
Ryan Blue
Software Engineer
Netflix