Re: Contributions and help needed in SPARK-40005

2022-08-30 Thread Khalid Mammadov
Will do, thanks!

On Wed, 31 Aug 2022, 01:14 Hyukjin Kwon,  wrote:

> Oh, that's a mistake. please just go ahead and reuse that JIRA :-).
> You can just create a PR with reusing the same JIRA ID for functions.py
>
> On Wed, 31 Aug 2022 at 01:18, Khalid Mammadov 
> wrote:
>
>> Hi @Hyukjin Kwon 
>>
>> I see you have resolved the JIRA and I got some more things to do in
>> functions.py (only done 50%). So shall I create a new JIRA for each new PR
>> or ok to reuse this one?
>>
>> On Fri, 19 Aug 2022, 09:29 Khalid Mammadov, 
>> wrote:
>>
>>> Will do, thanks!
>>>
>>> On Fri, 19 Aug 2022, 09:11 Hyukjin Kwon,  wrote:
>>>
 Sure, that would be great.

 I did the first 25 functions in functions.py. Please go ahead with the
 rest of them.
 You can create a PR with the title such
 as [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions
 examples self-contained (part 2, 25 functions)

 Thanks!

 On Fri, 19 Aug 2022 at 16:50, Khalid Mammadov <
 khalidmammad...@gmail.com> wrote:

> I am picking up "functions.py" if noone is already
>
> On Fri, 19 Aug 2022, 07:56 Khalid Mammadov, 
> wrote:
>
>> I thought it's all finished (checked few). Do you have list of those
>> 50%?
>> Happy to contribute 😊
>>
>> On Fri, 19 Aug 2022, 05:54 Hyukjin Kwon,  wrote:
>>
>>> We're half way, roughly 50%. More contributions would be very
>>> helpful.
>>> If the size of the file is too large, feel free to split it to
>>> multiple parts (e.g., https://github.com/apache/spark/pull/37575)
>>>
>>> On Tue, 9 Aug 2022 at 12:26, Qian SUN 
>>> wrote:
>>>
 Sure, I will do it. SPARK-40010
  is built to
 track progress.

 Hyukjin Kwon gurwls...@gmail.com
  于2022年8月9日周二 10:58写道:

 Please go ahead. Would be very appreciated.
>
> On Tue, 9 Aug 2022 at 11:58, Qian SUN 
> wrote:
>
>> Hi Hyukjin
>>
>> I would like to do some work and pick up *Window.py *if possible.
>>
>> Thanks,
>> Qian
>>
>> Hyukjin Kwon  于2022年8月9日周二 10:41写道:
>>
>>> Thanks Khalid for taking a look.
>>>
>>> On Tue, 9 Aug 2022 at 00:37, Khalid Mammadov <
>>> khalidmammad...@gmail.com> wrote:
>>>
 Hi Hyukjin
 That's great initiative, here is a PR that address one of those
 issues that's waiting for review:
 https://github.com/apache/spark/pull/37408

 Perhaps, it would be also good to track these pending issues
 somewhere to avoid effort duplication.

 For example, I would like to pick up *union* and *union all*
 if no one has already.

 Thanks,
 Khalid


 On Mon, Aug 8, 2022 at 1:44 PM Hyukjin Kwon <
 gurwls...@gmail.com> wrote:

> Hi all,
>
> I am trying to improve PySpark documentation especially:
>
>- Make the examples self-contained, e.g.,
>
> https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html
>- Document Parameters
>
> https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot.
>There are many API that misses parameters in PySpark, e.g., 
> DataFrame.union
>
> Here is one example PR I am working on:
> https://github.com/apache/spark/pull/37437
> I can't do it all by myself. Any help, review, and
> contributions would be welcome and appreciated.
>
> Thank you all in advance.
>

>>
>> --
>> Best!
>> Qian SUN
>>
> --
 Best!
 Qian SUN

>>>


Re: Contributions and help needed in SPARK-40005

2022-08-30 Thread Hyukjin Kwon
Oh, that's a mistake. please just go ahead and reuse that JIRA :-).
You can just create a PR with reusing the same JIRA ID for functions.py

On Wed, 31 Aug 2022 at 01:18, Khalid Mammadov 
wrote:

> Hi @Hyukjin Kwon 
>
> I see you have resolved the JIRA and I got some more things to do in
> functions.py (only done 50%). So shall I create a new JIRA for each new PR
> or ok to reuse this one?
>
> On Fri, 19 Aug 2022, 09:29 Khalid Mammadov, 
> wrote:
>
>> Will do, thanks!
>>
>> On Fri, 19 Aug 2022, 09:11 Hyukjin Kwon,  wrote:
>>
>>> Sure, that would be great.
>>>
>>> I did the first 25 functions in functions.py. Please go ahead with the
>>> rest of them.
>>> You can create a PR with the title such
>>> as [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions
>>> examples self-contained (part 2, 25 functions)
>>>
>>> Thanks!
>>>
>>> On Fri, 19 Aug 2022 at 16:50, Khalid Mammadov 
>>> wrote:
>>>
 I am picking up "functions.py" if noone is already

 On Fri, 19 Aug 2022, 07:56 Khalid Mammadov, 
 wrote:

> I thought it's all finished (checked few). Do you have list of those
> 50%?
> Happy to contribute 😊
>
> On Fri, 19 Aug 2022, 05:54 Hyukjin Kwon,  wrote:
>
>> We're half way, roughly 50%. More contributions would be very helpful.
>> If the size of the file is too large, feel free to split it to
>> multiple parts (e.g., https://github.com/apache/spark/pull/37575)
>>
>> On Tue, 9 Aug 2022 at 12:26, Qian SUN  wrote:
>>
>>> Sure, I will do it. SPARK-40010
>>>  is built to
>>> track progress.
>>>
>>> Hyukjin Kwon gurwls...@gmail.com 
>>> 于2022年8月9日周二 10:58写道:
>>>
>>> Please go ahead. Would be very appreciated.

 On Tue, 9 Aug 2022 at 11:58, Qian SUN 
 wrote:

> Hi Hyukjin
>
> I would like to do some work and pick up *Window.py *if possible.
>
> Thanks,
> Qian
>
> Hyukjin Kwon  于2022年8月9日周二 10:41写道:
>
>> Thanks Khalid for taking a look.
>>
>> On Tue, 9 Aug 2022 at 00:37, Khalid Mammadov <
>> khalidmammad...@gmail.com> wrote:
>>
>>> Hi Hyukjin
>>> That's great initiative, here is a PR that address one of those
>>> issues that's waiting for review:
>>> https://github.com/apache/spark/pull/37408
>>>
>>> Perhaps, it would be also good to track these pending issues
>>> somewhere to avoid effort duplication.
>>>
>>> For example, I would like to pick up *union* and *union all* if
>>> no one has already.
>>>
>>> Thanks,
>>> Khalid
>>>
>>>
>>> On Mon, Aug 8, 2022 at 1:44 PM Hyukjin Kwon 
>>> wrote:
>>>
 Hi all,

 I am trying to improve PySpark documentation especially:

- Make the examples self-contained, e.g.,

 https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html
- Document Parameters

 https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot.
There are many API that misses parameters in PySpark, e.g., 
 DataFrame.union

 Here is one example PR I am working on:
 https://github.com/apache/spark/pull/37437
 I can't do it all by myself. Any help, review, and
 contributions would be welcome and appreciated.

 Thank you all in advance.

>>>
>
> --
> Best!
> Qian SUN
>
 --
>>> Best!
>>> Qian SUN
>>>
>>


Re: Contributions and help needed in SPARK-40005

2022-08-30 Thread Khalid Mammadov
Hi @Hyukjin Kwon 

I see you have resolved the JIRA and I got some more things to do in
functions.py (only done 50%). So shall I create a new JIRA for each new PR
or ok to reuse this one?

On Fri, 19 Aug 2022, 09:29 Khalid Mammadov, 
wrote:

> Will do, thanks!
>
> On Fri, 19 Aug 2022, 09:11 Hyukjin Kwon,  wrote:
>
>> Sure, that would be great.
>>
>> I did the first 25 functions in functions.py. Please go ahead with the
>> rest of them.
>> You can create a PR with the title such
>> as [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions
>> examples self-contained (part 2, 25 functions)
>>
>> Thanks!
>>
>> On Fri, 19 Aug 2022 at 16:50, Khalid Mammadov 
>> wrote:
>>
>>> I am picking up "functions.py" if noone is already
>>>
>>> On Fri, 19 Aug 2022, 07:56 Khalid Mammadov, 
>>> wrote:
>>>
 I thought it's all finished (checked few). Do you have list of those
 50%?
 Happy to contribute 😊

 On Fri, 19 Aug 2022, 05:54 Hyukjin Kwon,  wrote:

> We're half way, roughly 50%. More contributions would be very helpful.
> If the size of the file is too large, feel free to split it to
> multiple parts (e.g., https://github.com/apache/spark/pull/37575)
>
> On Tue, 9 Aug 2022 at 12:26, Qian SUN  wrote:
>
>> Sure, I will do it. SPARK-40010
>>  is built to
>> track progress.
>>
>> Hyukjin Kwon gurwls...@gmail.com 
>> 于2022年8月9日周二 10:58写道:
>>
>> Please go ahead. Would be very appreciated.
>>>
>>> On Tue, 9 Aug 2022 at 11:58, Qian SUN 
>>> wrote:
>>>
 Hi Hyukjin

 I would like to do some work and pick up *Window.py *if possible.

 Thanks,
 Qian

 Hyukjin Kwon  于2022年8月9日周二 10:41写道:

> Thanks Khalid for taking a look.
>
> On Tue, 9 Aug 2022 at 00:37, Khalid Mammadov <
> khalidmammad...@gmail.com> wrote:
>
>> Hi Hyukjin
>> That's great initiative, here is a PR that address one of those
>> issues that's waiting for review:
>> https://github.com/apache/spark/pull/37408
>>
>> Perhaps, it would be also good to track these pending issues
>> somewhere to avoid effort duplication.
>>
>> For example, I would like to pick up *union* and *union all* if
>> no one has already.
>>
>> Thanks,
>> Khalid
>>
>>
>> On Mon, Aug 8, 2022 at 1:44 PM Hyukjin Kwon 
>> wrote:
>>
>>> Hi all,
>>>
>>> I am trying to improve PySpark documentation especially:
>>>
>>>- Make the examples self-contained, e.g.,
>>>
>>> https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html
>>>- Document Parameters
>>>
>>> https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot.
>>>There are many API that misses parameters in PySpark, e.g., 
>>> DataFrame.union
>>>
>>> Here is one example PR I am working on:
>>> https://github.com/apache/spark/pull/37437
>>> I can't do it all by myself. Any help, review, and contributions
>>> would be welcome and appreciated.
>>>
>>> Thank you all in advance.
>>>
>>

 --
 Best!
 Qian SUN

>>> --
>> Best!
>> Qian SUN
>>
>


Re: Contributions and help needed in SPARK-40005

2022-08-19 Thread Khalid Mammadov
Will do, thanks!

On Fri, 19 Aug 2022, 09:11 Hyukjin Kwon,  wrote:

> Sure, that would be great.
>
> I did the first 25 functions in functions.py. Please go ahead with the
> rest of them.
> You can create a PR with the title such
> as [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions
> examples self-contained (part 2, 25 functions)
>
> Thanks!
>
> On Fri, 19 Aug 2022 at 16:50, Khalid Mammadov 
> wrote:
>
>> I am picking up "functions.py" if noone is already
>>
>> On Fri, 19 Aug 2022, 07:56 Khalid Mammadov, 
>> wrote:
>>
>>> I thought it's all finished (checked few). Do you have list of those
>>> 50%?
>>> Happy to contribute 😊
>>>
>>> On Fri, 19 Aug 2022, 05:54 Hyukjin Kwon,  wrote:
>>>
 We're half way, roughly 50%. More contributions would be very helpful.
 If the size of the file is too large, feel free to split it to multiple
 parts (e.g., https://github.com/apache/spark/pull/37575)

 On Tue, 9 Aug 2022 at 12:26, Qian SUN  wrote:

> Sure, I will do it. SPARK-40010
>  is built to track
> progress.
>
> Hyukjin Kwon gurwls...@gmail.com 
> 于2022年8月9日周二 10:58写道:
>
> Please go ahead. Would be very appreciated.
>>
>> On Tue, 9 Aug 2022 at 11:58, Qian SUN  wrote:
>>
>>> Hi Hyukjin
>>>
>>> I would like to do some work and pick up *Window.py *if possible.
>>>
>>> Thanks,
>>> Qian
>>>
>>> Hyukjin Kwon  于2022年8月9日周二 10:41写道:
>>>
 Thanks Khalid for taking a look.

 On Tue, 9 Aug 2022 at 00:37, Khalid Mammadov <
 khalidmammad...@gmail.com> wrote:

> Hi Hyukjin
> That's great initiative, here is a PR that address one of those
> issues that's waiting for review:
> https://github.com/apache/spark/pull/37408
>
> Perhaps, it would be also good to track these pending issues
> somewhere to avoid effort duplication.
>
> For example, I would like to pick up *union* and *union all* if
> no one has already.
>
> Thanks,
> Khalid
>
>
> On Mon, Aug 8, 2022 at 1:44 PM Hyukjin Kwon 
> wrote:
>
>> Hi all,
>>
>> I am trying to improve PySpark documentation especially:
>>
>>- Make the examples self-contained, e.g.,
>>
>> https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html
>>- Document Parameters
>>
>> https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot.
>>There are many API that misses parameters in PySpark, e.g., 
>> DataFrame.union
>>
>> Here is one example PR I am working on:
>> https://github.com/apache/spark/pull/37437
>> I can't do it all by myself. Any help, review, and contributions
>> would be welcome and appreciated.
>>
>> Thank you all in advance.
>>
>
>>>
>>> --
>>> Best!
>>> Qian SUN
>>>
>> --
> Best!
> Qian SUN
>



Re: Contributions and help needed in SPARK-40005

2022-08-19 Thread Hyukjin Kwon
Sure, that would be great.

I did the first 25 functions in functions.py. Please go ahead with the rest
of them.
You can create a PR with the title such
as [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions
examples self-contained (part 2, 25 functions)

Thanks!

On Fri, 19 Aug 2022 at 16:50, Khalid Mammadov 
wrote:

> I am picking up "functions.py" if noone is already
>
> On Fri, 19 Aug 2022, 07:56 Khalid Mammadov, 
> wrote:
>
>> I thought it's all finished (checked few). Do you have list of those 50%?
>> Happy to contribute 😊
>>
>> On Fri, 19 Aug 2022, 05:54 Hyukjin Kwon,  wrote:
>>
>>> We're half way, roughly 50%. More contributions would be very helpful.
>>> If the size of the file is too large, feel free to split it to multiple
>>> parts (e.g., https://github.com/apache/spark/pull/37575)
>>>
>>> On Tue, 9 Aug 2022 at 12:26, Qian SUN  wrote:
>>>
 Sure, I will do it. SPARK-40010
  is built to track
 progress.

 Hyukjin Kwon gurwls...@gmail.com 
 于2022年8月9日周二 10:58写道:

 Please go ahead. Would be very appreciated.
>
> On Tue, 9 Aug 2022 at 11:58, Qian SUN  wrote:
>
>> Hi Hyukjin
>>
>> I would like to do some work and pick up *Window.py *if possible.
>>
>> Thanks,
>> Qian
>>
>> Hyukjin Kwon  于2022年8月9日周二 10:41写道:
>>
>>> Thanks Khalid for taking a look.
>>>
>>> On Tue, 9 Aug 2022 at 00:37, Khalid Mammadov <
>>> khalidmammad...@gmail.com> wrote:
>>>
 Hi Hyukjin
 That's great initiative, here is a PR that address one of those
 issues that's waiting for review:
 https://github.com/apache/spark/pull/37408

 Perhaps, it would be also good to track these pending issues
 somewhere to avoid effort duplication.

 For example, I would like to pick up *union* and *union all* if no
 one has already.

 Thanks,
 Khalid


 On Mon, Aug 8, 2022 at 1:44 PM Hyukjin Kwon 
 wrote:

> Hi all,
>
> I am trying to improve PySpark documentation especially:
>
>- Make the examples self-contained, e.g.,
>
> https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html
>- Document Parameters
>
> https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot.
>There are many API that misses parameters in PySpark, e.g., 
> DataFrame.union
>
> Here is one example PR I am working on:
> https://github.com/apache/spark/pull/37437
> I can't do it all by myself. Any help, review, and contributions
> would be welcome and appreciated.
>
> Thank you all in advance.
>

>>
>> --
>> Best!
>> Qian SUN
>>
> --
 Best!
 Qian SUN

>>>


Re: Contributions and help needed in SPARK-40005

2022-08-19 Thread Khalid Mammadov
I am picking up "functions.py" if noone is already

On Fri, 19 Aug 2022, 07:56 Khalid Mammadov, 
wrote:

> I thought it's all finished (checked few). Do you have list of those 50%?
> Happy to contribute 😊
>
> On Fri, 19 Aug 2022, 05:54 Hyukjin Kwon,  wrote:
>
>> We're half way, roughly 50%. More contributions would be very helpful.
>> If the size of the file is too large, feel free to split it to multiple
>> parts (e.g., https://github.com/apache/spark/pull/37575)
>>
>> On Tue, 9 Aug 2022 at 12:26, Qian SUN  wrote:
>>
>>> Sure, I will do it. SPARK-40010
>>>  is built to track
>>> progress.
>>>
>>> Hyukjin Kwon gurwls...@gmail.com 
>>> 于2022年8月9日周二 10:58写道:
>>>
>>> Please go ahead. Would be very appreciated.

 On Tue, 9 Aug 2022 at 11:58, Qian SUN  wrote:

> Hi Hyukjin
>
> I would like to do some work and pick up *Window.py *if possible.
>
> Thanks,
> Qian
>
> Hyukjin Kwon  于2022年8月9日周二 10:41写道:
>
>> Thanks Khalid for taking a look.
>>
>> On Tue, 9 Aug 2022 at 00:37, Khalid Mammadov <
>> khalidmammad...@gmail.com> wrote:
>>
>>> Hi Hyukjin
>>> That's great initiative, here is a PR that address one of those
>>> issues that's waiting for review:
>>> https://github.com/apache/spark/pull/37408
>>>
>>> Perhaps, it would be also good to track these pending issues
>>> somewhere to avoid effort duplication.
>>>
>>> For example, I would like to pick up *union* and *union all* if no
>>> one has already.
>>>
>>> Thanks,
>>> Khalid
>>>
>>>
>>> On Mon, Aug 8, 2022 at 1:44 PM Hyukjin Kwon 
>>> wrote:
>>>
 Hi all,

 I am trying to improve PySpark documentation especially:

- Make the examples self-contained, e.g.,

 https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html
- Document Parameters

 https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot.
There are many API that misses parameters in PySpark, e.g., 
 DataFrame.union

 Here is one example PR I am working on:
 https://github.com/apache/spark/pull/37437
 I can't do it all by myself. Any help, review, and contributions
 would be welcome and appreciated.

 Thank you all in advance.

>>>
>
> --
> Best!
> Qian SUN
>
 --
>>> Best!
>>> Qian SUN
>>>
>>


Re: Contributions and help needed in SPARK-40005

2022-08-18 Thread Khalid Mammadov
I thought it's all finished (checked few). Do you have list of those 50%?
Happy to contribute 😊

On Fri, 19 Aug 2022, 05:54 Hyukjin Kwon,  wrote:

> We're half way, roughly 50%. More contributions would be very helpful.
> If the size of the file is too large, feel free to split it to multiple
> parts (e.g., https://github.com/apache/spark/pull/37575)
>
> On Tue, 9 Aug 2022 at 12:26, Qian SUN  wrote:
>
>> Sure, I will do it. SPARK-40010
>>  is built to track
>> progress.
>>
>> Hyukjin Kwon gurwls...@gmail.com 
>> 于2022年8月9日周二 10:58写道:
>>
>> Please go ahead. Would be very appreciated.
>>>
>>> On Tue, 9 Aug 2022 at 11:58, Qian SUN  wrote:
>>>
 Hi Hyukjin

 I would like to do some work and pick up *Window.py *if possible.

 Thanks,
 Qian

 Hyukjin Kwon  于2022年8月9日周二 10:41写道:

> Thanks Khalid for taking a look.
>
> On Tue, 9 Aug 2022 at 00:37, Khalid Mammadov <
> khalidmammad...@gmail.com> wrote:
>
>> Hi Hyukjin
>> That's great initiative, here is a PR that address one of those
>> issues that's waiting for review:
>> https://github.com/apache/spark/pull/37408
>>
>> Perhaps, it would be also good to track these pending issues
>> somewhere to avoid effort duplication.
>>
>> For example, I would like to pick up *union* and *union all* if no
>> one has already.
>>
>> Thanks,
>> Khalid
>>
>>
>> On Mon, Aug 8, 2022 at 1:44 PM Hyukjin Kwon 
>> wrote:
>>
>>> Hi all,
>>>
>>> I am trying to improve PySpark documentation especially:
>>>
>>>- Make the examples self-contained, e.g.,
>>>
>>> https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html
>>>- Document Parameters
>>>
>>> https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot.
>>>There are many API that misses parameters in PySpark, e.g., 
>>> DataFrame.union
>>>
>>> Here is one example PR I am working on:
>>> https://github.com/apache/spark/pull/37437
>>> I can't do it all by myself. Any help, review, and contributions
>>> would be welcome and appreciated.
>>>
>>> Thank you all in advance.
>>>
>>

 --
 Best!
 Qian SUN

>>> --
>> Best!
>> Qian SUN
>>
>


Re: Contributions and help needed in SPARK-40005

2022-08-18 Thread Hyukjin Kwon
We're half way, roughly 50%. More contributions would be very helpful.
If the size of the file is too large, feel free to split it to multiple
parts (e.g., https://github.com/apache/spark/pull/37575)

On Tue, 9 Aug 2022 at 12:26, Qian SUN  wrote:

> Sure, I will do it. SPARK-40010
>  is built to track
> progress.
>
> Hyukjin Kwon gurwls...@gmail.com 
> 于2022年8月9日周二 10:58写道:
>
> Please go ahead. Would be very appreciated.
>>
>> On Tue, 9 Aug 2022 at 11:58, Qian SUN  wrote:
>>
>>> Hi Hyukjin
>>>
>>> I would like to do some work and pick up *Window.py *if possible.
>>>
>>> Thanks,
>>> Qian
>>>
>>> Hyukjin Kwon  于2022年8月9日周二 10:41写道:
>>>
 Thanks Khalid for taking a look.

 On Tue, 9 Aug 2022 at 00:37, Khalid Mammadov 
 wrote:

> Hi Hyukjin
> That's great initiative, here is a PR that address one of those issues
> that's waiting for review: https://github.com/apache/spark/pull/37408
>
> Perhaps, it would be also good to track these pending issues somewhere
> to avoid effort duplication.
>
> For example, I would like to pick up *union* and *union all* if no
> one has already.
>
> Thanks,
> Khalid
>
>
> On Mon, Aug 8, 2022 at 1:44 PM Hyukjin Kwon 
> wrote:
>
>> Hi all,
>>
>> I am trying to improve PySpark documentation especially:
>>
>>- Make the examples self-contained, e.g.,
>>
>> https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html
>>- Document Parameters
>>
>> https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot.
>>There are many API that misses parameters in PySpark, e.g., 
>> DataFrame.union
>>
>> Here is one example PR I am working on:
>> https://github.com/apache/spark/pull/37437
>> I can't do it all by myself. Any help, review, and contributions
>> would be welcome and appreciated.
>>
>> Thank you all in advance.
>>
>
>>>
>>> --
>>> Best!
>>> Qian SUN
>>>
>> --
> Best!
> Qian SUN
>


Re: Contributions and help needed in SPARK-40005

2022-08-08 Thread Qian SUN
Sure, I will do it. SPARK-40010
 is built to track
progress.

Hyukjin Kwon gurwls...@gmail.com 
于2022年8月9日周二 10:58写道:

Please go ahead. Would be very appreciated.
>
> On Tue, 9 Aug 2022 at 11:58, Qian SUN  wrote:
>
>> Hi Hyukjin
>>
>> I would like to do some work and pick up *Window.py *if possible.
>>
>> Thanks,
>> Qian
>>
>> Hyukjin Kwon  于2022年8月9日周二 10:41写道:
>>
>>> Thanks Khalid for taking a look.
>>>
>>> On Tue, 9 Aug 2022 at 00:37, Khalid Mammadov 
>>> wrote:
>>>
 Hi Hyukjin
 That's great initiative, here is a PR that address one of those issues
 that's waiting for review: https://github.com/apache/spark/pull/37408

 Perhaps, it would be also good to track these pending issues somewhere
 to avoid effort duplication.

 For example, I would like to pick up *union* and *union all* if no
 one has already.

 Thanks,
 Khalid


 On Mon, Aug 8, 2022 at 1:44 PM Hyukjin Kwon 
 wrote:

> Hi all,
>
> I am trying to improve PySpark documentation especially:
>
>- Make the examples self-contained, e.g.,
>
> https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html
>- Document Parameters
>
> https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot.
>There are many API that misses parameters in PySpark, e.g., 
> DataFrame.union
>
> Here is one example PR I am working on:
> https://github.com/apache/spark/pull/37437
> I can't do it all by myself. Any help, review, and contributions
> would be welcome and appreciated.
>
> Thank you all in advance.
>

>>
>> --
>> Best!
>> Qian SUN
>>
> --
Best!
Qian SUN


Re: Contributions and help needed in SPARK-40005

2022-08-08 Thread Hyukjin Kwon
Please go ahead. Would be very appreciated.

On Tue, 9 Aug 2022 at 11:58, Qian SUN  wrote:

> Hi Hyukjin
>
> I would like to do some work and pick up *Window.py *if possible.
>
> Thanks,
> Qian
>
> Hyukjin Kwon  于2022年8月9日周二 10:41写道:
>
>> Thanks Khalid for taking a look.
>>
>> On Tue, 9 Aug 2022 at 00:37, Khalid Mammadov 
>> wrote:
>>
>>> Hi Hyukjin
>>> That's great initiative, here is a PR that address one of those issues
>>> that's waiting for review: https://github.com/apache/spark/pull/37408
>>>
>>> Perhaps, it would be also good to track these pending issues somewhere
>>> to avoid effort duplication.
>>>
>>> For example, I would like to pick up *union* and *union all* if no
>>> one has already.
>>>
>>> Thanks,
>>> Khalid
>>>
>>>
>>> On Mon, Aug 8, 2022 at 1:44 PM Hyukjin Kwon  wrote:
>>>
 Hi all,

 I am trying to improve PySpark documentation especially:

- Make the examples self-contained, e.g.,
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html
- Document Parameters

 https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot.
There are many API that misses parameters in PySpark, e.g., 
 DataFrame.union

 Here is one example PR I am working on:
 https://github.com/apache/spark/pull/37437
 I can't do it all by myself. Any help, review, and contributions
 would be welcome and appreciated.

 Thank you all in advance.

>>>
>
> --
> Best!
> Qian SUN
>


Re: Contributions and help needed in SPARK-40005

2022-08-08 Thread Qian SUN
Hi Hyukjin

I would like to do some work and pick up *Window.py *if possible.

Thanks,
Qian

Hyukjin Kwon  于2022年8月9日周二 10:41写道:

> Thanks Khalid for taking a look.
>
> On Tue, 9 Aug 2022 at 00:37, Khalid Mammadov 
> wrote:
>
>> Hi Hyukjin
>> That's great initiative, here is a PR that address one of those issues
>> that's waiting for review: https://github.com/apache/spark/pull/37408
>>
>> Perhaps, it would be also good to track these pending issues somewhere to
>> avoid effort duplication.
>>
>> For example, I would like to pick up *union* and *union all* if no
>> one has already.
>>
>> Thanks,
>> Khalid
>>
>>
>> On Mon, Aug 8, 2022 at 1:44 PM Hyukjin Kwon  wrote:
>>
>>> Hi all,
>>>
>>> I am trying to improve PySpark documentation especially:
>>>
>>>- Make the examples self-contained, e.g.,
>>>https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html
>>>- Document Parameters
>>>
>>> https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot.
>>>There are many API that misses parameters in PySpark, e.g., 
>>> DataFrame.union
>>>
>>> Here is one example PR I am working on:
>>> https://github.com/apache/spark/pull/37437
>>> I can't do it all by myself. Any help, review, and contributions
>>> would be welcome and appreciated.
>>>
>>> Thank you all in advance.
>>>
>>

-- 
Best!
Qian SUN


Re: Contributions and help needed in SPARK-40005

2022-08-08 Thread Hyukjin Kwon
Thanks Khalid for taking a look.

On Tue, 9 Aug 2022 at 00:37, Khalid Mammadov 
wrote:

> Hi Hyukjin
> That's great initiative, here is a PR that address one of those issues
> that's waiting for review: https://github.com/apache/spark/pull/37408
>
> Perhaps, it would be also good to track these pending issues somewhere to
> avoid effort duplication.
>
> For example, I would like to pick up *union* and *union all* if no
> one has already.
>
> Thanks,
> Khalid
>
>
> On Mon, Aug 8, 2022 at 1:44 PM Hyukjin Kwon  wrote:
>
>> Hi all,
>>
>> I am trying to improve PySpark documentation especially:
>>
>>- Make the examples self-contained, e.g.,
>>https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html
>>- Document Parameters
>>
>> https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot.
>>There are many API that misses parameters in PySpark, e.g., 
>> DataFrame.union
>>
>> Here is one example PR I am working on:
>> https://github.com/apache/spark/pull/37437
>> I can't do it all by myself. Any help, review, and contributions would be
>> welcome and appreciated.
>>
>> Thank you all in advance.
>>
>


Re: Contributions and help needed in SPARK-40005

2022-08-08 Thread Khalid Mammadov
Hi Hyukjin
That's great initiative, here is a PR that address one of those issues
that's waiting for review: https://github.com/apache/spark/pull/37408

Perhaps, it would be also good to track these pending issues somewhere to
avoid effort duplication.

For example, I would like to pick up *union* and *union all* if no one has
already.

Thanks,
Khalid


On Mon, Aug 8, 2022 at 1:44 PM Hyukjin Kwon  wrote:

> Hi all,
>
> I am trying to improve PySpark documentation especially:
>
>- Make the examples self-contained, e.g.,
>https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html
>- Document Parameters
>
> https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot.
>There are many API that misses parameters in PySpark, e.g., DataFrame.union
>
> Here is one example PR I am working on:
> https://github.com/apache/spark/pull/37437
> I can't do it all by myself. Any help, review, and contributions would be
> welcome and appreciated.
>
> Thank you all in advance.
>