Re: 4.0.0-preview1 test report: running on Yarn

2024-06-18 Thread George Magiros
Thank you all so much for the kind words of encouragement on my first test
report.  As a follow up, I ran all my HDFS and Yarn nodes on Java 8 -
including my Nodemanagers.  I then modified Spark's
conf/spark-defaults.conf according to Mr. Pan's prior post, and it worked:
I was able to submit SparkPi and my PySpark code using 4.0.0-preview1 to
Yarn, successfully deploying in both client and cluster mode.  Without the
changes, Yarn would have otherwise thrown an Unsupported Class Version
Error about org/apache/spark/deploy/yarn/ExecutorLauncher.  George

On Tue, Jun 18, 2024 at 6:26 AM Cheng Pan  wrote:

> FYI, I have submitted SPARK-48651(
> https://github.com/apache/spark/pull/47010) to update the Spark on YARN
> docs for JDK configuration, looking forward to your feedback.
>
> Thanks,
> Cheng Pan
>
>
> On Jun 18, 2024, at 02:00, George Magiros  wrote:
>
> I successfully submitted and ran org.apache.spark.examples.SparkPi on Yarn
> using 4.0.0-preview1.  However I got it to work only after fixing an issue
> with the Yarn nodemanagers (Hadoop v3.3.6 and v3.4.0).  Namely the issue
> was:
> 1. If the nodemanagers used java 11, Yarn threw an error about not finding
> the jdk.incubator.vector module.
> 2. If the nodemanagers used java 17, which has the jdk.incubator.vector
> module, Yarn threw a reflection error about class not found.
>
> To resolve the error and successfully calculate pi,
> 1. I ran java 17 on the nodemanagers and
> 2. added 'export
> HADOOP_OPTS="--add-opens=java.base/java.lang=ALL-UNNAMED"' to their
> conf/hadoop-env.sh file.
>
> George
>
>
>


Re: 4.0.0-preview1 test report: running on Yarn

2024-06-18 Thread Cheng Pan
FYI, I have submitted SPARK-48651(https://github.com/apache/spark/pull/47010) 
to update the Spark on YARN docs for JDK configuration, looking forward to your 
feedback.

Thanks,
Cheng Pan


> On Jun 18, 2024, at 02:00, George Magiros  wrote:
> 
> I successfully submitted and ran org.apache.spark.examples.SparkPi on Yarn 
> using 4.0.0-preview1.  However I got it to work only after fixing an issue 
> with the Yarn nodemanagers (Hadoop v3.3.6 and v3.4.0).  Namely the issue was:
> 1. If the nodemanagers used java 11, Yarn threw an error about not finding 
> the jdk.incubator.vector module.
> 2. If the nodemanagers used java 17, which has the jdk.incubator.vector 
> module, Yarn threw a reflection error about class not found.
> 
> To resolve the error and successfully calculate pi, 
> 1. I ran java 17 on the nodemanagers and 
> 2. added 'export HADOOP_OPTS="--add-opens=java.base/java.lang=ALL-UNNAMED"' 
> to their conf/hadoop-env.sh file.
> 
> George
> 



Re: 4.0.0-preview1 test report: running on Yarn

2024-06-17 Thread Cheng Pan
You don’t need to upgrade Java for HDFS and YARN. Just keep using Java 8 for 
Hadoop and set JAVA_HOME to Java 17 for Spark applications[1].

0. Install Java 17 on all nodes, for example, under /opt/openjdk-17

1. Modify $SPARK_CONF_DIR/spark-env.sh
export JAVA_HOME=/opt/openjdk-17

2. Modify $SPARK_CONF_DIR/spark-defaults.conf
spark.yarn.appMasterEnv.JAVA_HOME=/opt/openjdk-17
spark.executorEnv.JAVA_HOME=/opt/openjdk-17

[1] 
https://github.com/awesome-kyuubi/hadoop-testing/commit/9f7c0d7388dfc7fbe6e4658515a6c28d5ba93c8e

Thanks,
Cheng Pan


> On Jun 18, 2024, at 02:00, George Magiros  wrote:
> 
> I successfully submitted and ran org.apache.spark.examples.SparkPi on Yarn 
> using 4.0.0-preview1.  However I got it to work only after fixing an issue 
> with the Yarn nodemanagers (Hadoop v3.3.6 and v3.4.0).  Namely the issue was:
> 1. If the nodemanagers used java 11, Yarn threw an error about not finding 
> the jdk.incubator.vector module.
> 2. If the nodemanagers used java 17, which has the jdk.incubator.vector 
> module, Yarn threw a reflection error about class not found.
> 
> To resolve the error and successfully calculate pi, 
> 1. I ran java 17 on the nodemanagers and 
> 2. added 'export HADOOP_OPTS="--add-opens=java.base/java.lang=ALL-UNNAMED"' 
> to their conf/hadoop-env.sh file.
> 
> George
> 



Re: 4.0.0-preview1 test report: running on Yarn

2024-06-17 Thread Wenchen Fan
Thanks for sharing! Yea Spark 4.0 is built using Java 17.

On Tue, Jun 18, 2024 at 5:07 AM George Magiros  wrote:

> I successfully submitted and ran org.apache.spark.examples.SparkPi on Yarn
> using 4.0.0-preview1.  However I got it to work only after fixing an issue
> with the Yarn nodemanagers (Hadoop v3.3.6 and v3.4.0).  Namely the issue
> was:
> 1. If the nodemanagers used java 11, Yarn threw an error about not finding
> the jdk.incubator.vector module.
> 2. If the nodemanagers used java 17, which has the jdk.incubator.vector
> module, Yarn threw a reflection error about class not found.
>
> To resolve the error and successfully calculate pi,
> 1. I ran java 17 on the nodemanagers and
> 2. added 'export
> HADOOP_OPTS="--add-opens=java.base/java.lang=ALL-UNNAMED"' to their
> conf/hadoop-env.sh file.
>
> George
>
>


4.0.0-preview1 test report: running on Yarn

2024-06-17 Thread George Magiros
I successfully submitted and ran org.apache.spark.examples.SparkPi on Yarn
using 4.0.0-preview1.  However I got it to work only after fixing an issue
with the Yarn nodemanagers (Hadoop v3.3.6 and v3.4.0).  Namely the issue
was:
1. If the nodemanagers used java 11, Yarn threw an error about not finding
the jdk.incubator.vector module.
2. If the nodemanagers used java 17, which has the jdk.incubator.vector
module, Yarn threw a reflection error about class not found.

To resolve the error and successfully calculate pi,
1. I ran java 17 on the nodemanagers and
2. added 'export HADOOP_OPTS="--add-opens=java.base/java.lang=ALL-UNNAMED"'
to their conf/hadoop-env.sh file.

George


Re: [FYI] SPARK-45981: Improve Python language test coverage

2023-12-02 Thread Hyukjin Kwon
Awesome!

On Sat, Dec 2, 2023 at 2:33 PM Dongjoon Hyun 
wrote:

> Hi, All.
>
> As a part of Apache Spark 4.0.0 (SPARK-44111), the Apache Spark community
> starts to have test coverage for all supported Python versions from Today.
>
> - https://github.com/apache/spark/actions/runs/7061665420
>
> Here is a summary.
>
> 1. Main CI: All PRs and commits on `master` branch are tested with Python
> 3.9.
> 2. Daily CI:
> https://github.com/apache/spark/actions/workflows/build_python.yml
> - PyPy 3.8
> - Python 3.10
> - Python 3.11
> - Python 3.12
>
> This is a great addition for PySpark 4.0+ users and an extensible
> framework for all future Python versions.
>
> Thank you all for making this together!
>
> Best,
> Dongjoon.
>


[FYI] SPARK-45981: Improve Python language test coverage

2023-12-01 Thread Dongjoon Hyun
Hi, All.

As a part of Apache Spark 4.0.0 (SPARK-44111), the Apache Spark community
starts to have test coverage for all supported Python versions from Today.

- https://github.com/apache/spark/actions/runs/7061665420

Here is a summary.

1. Main CI: All PRs and commits on `master` branch are tested with Python
3.9.
2. Daily CI:
https://github.com/apache/spark/actions/workflows/build_python.yml
- PyPy 3.8
- Python 3.10
- Python 3.11
- Python 3.12

This is a great addition for PySpark 4.0+ users and an extensible framework
for all future Python versions.

Thank you all for making this together!

Best,
Dongjoon.


[VOTE][RESULT] PySpark Test Framework

2023-06-26 Thread Amanda Liu
The vote passes with 10 +1s (nine binding +1s) and one +0.

Thank you all for your participation and comments!

(* = binding)
+1:
- Holden Karau (*)
- Reynold Xin (*)
- Mich Talebzadeh
- Maciej Szymkiewicz (*)
- Hyukjin Kwon (*)
- Dongjoon Hyun (*)
- Ruifeng Zheng (*)
- Xinrong Meng (*)
- Liang-Chi Hsieh (*)
- Yikun Jiang (*)

+0: Jacek Laskowski

-1: None

Best,
Amanda Liu


Re: [VOTE][SPIP] PySpark Test Framework

2023-06-24 Thread Yikun Jiang
+1

Regards,
Yikun


On Fri, Jun 23, 2023 at 6:17 AM L. C. Hsieh  wrote:

> +1
>
> On Thu, Jun 22, 2023 at 3:10 PM Xinrong Meng  wrote:
> >
> > +1
> >
> > Thanks for driving that!
> >
> > On Wed, Jun 21, 2023 at 10:25 PM Ruifeng Zheng 
> wrote:
> >>
> >> +1
> >>
> >> On Thu, Jun 22, 2023 at 1:11 PM Dongjoon Hyun 
> wrote:
> >>>
> >>> +1
> >>>
> >>> Dongjoon
> >>>
> >>> On Wed, Jun 21, 2023 at 8:56 PM Hyukjin Kwon 
> wrote:
> >>>>
> >>>> +1
> >>>>
> >>>> On Thu, 22 Jun 2023 at 02:20, Jacek Laskowski 
> wrote:
> >>>>>
> >>>>> +0
> >>>>>
> >>>>> Pozdrawiam,
> >>>>> Jacek Laskowski
> >>>>> 
> >>>>> "The Internals Of" Online Books
> >>>>> Follow me on https://twitter.com/jaceklaskowski
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Wed, Jun 21, 2023 at 5:11 PM Amanda Liu <
> amandastephanie...@gmail.com> wrote:
> >>>>>>
> >>>>>> Hi all,
> >>>>>>
> >>>>>> I'd like to start the vote for SPIP: PySpark Test Framework.
> >>>>>>
> >>>>>> The high-level summary for the SPIP is that it proposes an official
> test framework for PySpark. Currently, there are only disparate open-source
> repos and blog posts for PySpark testing resources. We can streamline and
> simplify the testing process by incorporating test features, such as a
> PySpark Test Base class (which allows tests to share Spark sessions) and
> test util functions (for example, asserting dataframe and schema equality).
> >>>>>>
> >>>>>> SPIP doc:
> https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v
> >>>>>>
> >>>>>> JIRA ticket: https://issues.apache.org/jira/browse/SPARK-44042
> >>>>>>
> >>>>>> Discussion thread:
> https://lists.apache.org/thread/trwgbgn3ycoj8b8k8lkxko2hql23o41n
> >>>>>>
> >>>>>> Please vote on the SPIP for the next 72 hours:
> >>>>>> [ ] +1: Accept the proposal as an official SPIP
> >>>>>> [ ] +0
> >>>>>> [ ] -1: I don’t think this is a good idea because __.
> >>>>>>
> >>>>>> Thank you!
> >>>>>>
> >>>>>> Best,
> >>>>>> Amanda Liu
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE][SPIP] PySpark Test Framework

2023-06-22 Thread L. C. Hsieh
+1

On Thu, Jun 22, 2023 at 3:10 PM Xinrong Meng  wrote:
>
> +1
>
> Thanks for driving that!
>
> On Wed, Jun 21, 2023 at 10:25 PM Ruifeng Zheng  wrote:
>>
>> +1
>>
>> On Thu, Jun 22, 2023 at 1:11 PM Dongjoon Hyun  
>> wrote:
>>>
>>> +1
>>>
>>> Dongjoon
>>>
>>> On Wed, Jun 21, 2023 at 8:56 PM Hyukjin Kwon  wrote:
>>>>
>>>> +1
>>>>
>>>> On Thu, 22 Jun 2023 at 02:20, Jacek Laskowski  wrote:
>>>>>
>>>>> +0
>>>>>
>>>>> Pozdrawiam,
>>>>> Jacek Laskowski
>>>>> 
>>>>> "The Internals Of" Online Books
>>>>> Follow me on https://twitter.com/jaceklaskowski
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jun 21, 2023 at 5:11 PM Amanda Liu  
>>>>> wrote:
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I'd like to start the vote for SPIP: PySpark Test Framework.
>>>>>>
>>>>>> The high-level summary for the SPIP is that it proposes an official test 
>>>>>> framework for PySpark. Currently, there are only disparate open-source 
>>>>>> repos and blog posts for PySpark testing resources. We can streamline 
>>>>>> and simplify the testing process by incorporating test features, such as 
>>>>>> a PySpark Test Base class (which allows tests to share Spark sessions) 
>>>>>> and test util functions (for example, asserting dataframe and schema 
>>>>>> equality).
>>>>>>
>>>>>> SPIP doc: 
>>>>>> https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v
>>>>>>
>>>>>> JIRA ticket: https://issues.apache.org/jira/browse/SPARK-44042
>>>>>>
>>>>>> Discussion thread: 
>>>>>> https://lists.apache.org/thread/trwgbgn3ycoj8b8k8lkxko2hql23o41n
>>>>>>
>>>>>> Please vote on the SPIP for the next 72 hours:
>>>>>> [ ] +1: Accept the proposal as an official SPIP
>>>>>> [ ] +0
>>>>>> [ ] -1: I don’t think this is a good idea because __.
>>>>>>
>>>>>> Thank you!
>>>>>>
>>>>>> Best,
>>>>>> Amanda Liu

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE][SPIP] PySpark Test Framework

2023-06-22 Thread Xinrong Meng
+1

Thanks for driving that!

On Wed, Jun 21, 2023 at 10:25 PM Ruifeng Zheng  wrote:

> +1
>
> On Thu, Jun 22, 2023 at 1:11 PM Dongjoon Hyun 
> wrote:
>
>> +1
>>
>> Dongjoon
>>
>> On Wed, Jun 21, 2023 at 8:56 PM Hyukjin Kwon 
>> wrote:
>>
>>> +1
>>>
>>> On Thu, 22 Jun 2023 at 02:20, Jacek Laskowski  wrote:
>>>
>>>> +0
>>>>
>>>> Pozdrawiam,
>>>> Jacek Laskowski
>>>> 
>>>> "The Internals Of" Online Books <https://books.japila.pl/>
>>>> Follow me on https://twitter.com/jaceklaskowski
>>>>
>>>> <https://twitter.com/jaceklaskowski>
>>>>
>>>>
>>>> On Wed, Jun 21, 2023 at 5:11 PM Amanda Liu <
>>>> amandastephanie...@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I'd like to start the vote for SPIP: PySpark Test Framework.
>>>>>
>>>>> The high-level summary for the SPIP is that it proposes an official
>>>>> test framework for PySpark. Currently, there are only disparate 
>>>>> open-source
>>>>> repos and blog posts for PySpark testing resources. We can streamline and
>>>>> simplify the testing process by incorporating test features, such as a
>>>>> PySpark Test Base class (which allows tests to share Spark sessions) and
>>>>> test util functions (for example, asserting dataframe and schema 
>>>>> equality).
>>>>>
>>>>> *SPIP doc:*
>>>>> https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v
>>>>>
>>>>> *JIRA ticket:* https://issues.apache.org/jira/browse/SPARK-44042
>>>>>
>>>>> *Discussion thread:*
>>>>> https://lists.apache.org/thread/trwgbgn3ycoj8b8k8lkxko2hql23o41n
>>>>>
>>>>> Please vote on the SPIP for the next 72 hours:
>>>>> [ ] +1: Accept the proposal as an official SPIP
>>>>> [ ] +0
>>>>> [ ] -1: I don’t think this is a good idea because __.
>>>>>
>>>>> Thank you!
>>>>>
>>>>> Best,
>>>>> Amanda Liu
>>>>>
>>>>


Re: [VOTE][SPIP] PySpark Test Framework

2023-06-21 Thread Ruifeng Zheng
+1

On Thu, Jun 22, 2023 at 1:11 PM Dongjoon Hyun 
wrote:

> +1
>
> Dongjoon
>
> On Wed, Jun 21, 2023 at 8:56 PM Hyukjin Kwon  wrote:
>
>> +1
>>
>> On Thu, 22 Jun 2023 at 02:20, Jacek Laskowski  wrote:
>>
>>> +0
>>>
>>> Pozdrawiam,
>>> Jacek Laskowski
>>> 
>>> "The Internals Of" Online Books <https://books.japila.pl/>
>>> Follow me on https://twitter.com/jaceklaskowski
>>>
>>> <https://twitter.com/jaceklaskowski>
>>>
>>>
>>> On Wed, Jun 21, 2023 at 5:11 PM Amanda Liu 
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I'd like to start the vote for SPIP: PySpark Test Framework.
>>>>
>>>> The high-level summary for the SPIP is that it proposes an official
>>>> test framework for PySpark. Currently, there are only disparate open-source
>>>> repos and blog posts for PySpark testing resources. We can streamline and
>>>> simplify the testing process by incorporating test features, such as a
>>>> PySpark Test Base class (which allows tests to share Spark sessions) and
>>>> test util functions (for example, asserting dataframe and schema equality).
>>>>
>>>> *SPIP doc:*
>>>> https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v
>>>>
>>>> *JIRA ticket:* https://issues.apache.org/jira/browse/SPARK-44042
>>>>
>>>> *Discussion thread:*
>>>> https://lists.apache.org/thread/trwgbgn3ycoj8b8k8lkxko2hql23o41n
>>>>
>>>> Please vote on the SPIP for the next 72 hours:
>>>> [ ] +1: Accept the proposal as an official SPIP
>>>> [ ] +0
>>>> [ ] -1: I don’t think this is a good idea because __.
>>>>
>>>> Thank you!
>>>>
>>>> Best,
>>>> Amanda Liu
>>>>
>>>


Re: [VOTE][SPIP] PySpark Test Framework

2023-06-21 Thread Dongjoon Hyun
+1

Dongjoon

On Wed, Jun 21, 2023 at 8:56 PM Hyukjin Kwon  wrote:

> +1
>
> On Thu, 22 Jun 2023 at 02:20, Jacek Laskowski  wrote:
>
>> +0
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> 
>> "The Internals Of" Online Books <https://books.japila.pl/>
>> Follow me on https://twitter.com/jaceklaskowski
>>
>> <https://twitter.com/jaceklaskowski>
>>
>>
>> On Wed, Jun 21, 2023 at 5:11 PM Amanda Liu 
>> wrote:
>>
>>> Hi all,
>>>
>>> I'd like to start the vote for SPIP: PySpark Test Framework.
>>>
>>> The high-level summary for the SPIP is that it proposes an official test
>>> framework for PySpark. Currently, there are only disparate open-source
>>> repos and blog posts for PySpark testing resources. We can streamline and
>>> simplify the testing process by incorporating test features, such as a
>>> PySpark Test Base class (which allows tests to share Spark sessions) and
>>> test util functions (for example, asserting dataframe and schema equality).
>>>
>>> *SPIP doc:*
>>> https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v
>>>
>>> *JIRA ticket:* https://issues.apache.org/jira/browse/SPARK-44042
>>>
>>> *Discussion thread:*
>>> https://lists.apache.org/thread/trwgbgn3ycoj8b8k8lkxko2hql23o41n
>>>
>>> Please vote on the SPIP for the next 72 hours:
>>> [ ] +1: Accept the proposal as an official SPIP
>>> [ ] +0
>>> [ ] -1: I don’t think this is a good idea because __.
>>>
>>> Thank you!
>>>
>>> Best,
>>> Amanda Liu
>>>
>>


Re: [VOTE][SPIP] PySpark Test Framework

2023-06-21 Thread Hyukjin Kwon
+1

On Thu, 22 Jun 2023 at 02:20, Jacek Laskowski  wrote:

> +0
>
> Pozdrawiam,
> Jacek Laskowski
> 
> "The Internals Of" Online Books <https://books.japila.pl/>
> Follow me on https://twitter.com/jaceklaskowski
>
> <https://twitter.com/jaceklaskowski>
>
>
> On Wed, Jun 21, 2023 at 5:11 PM Amanda Liu 
> wrote:
>
>> Hi all,
>>
>> I'd like to start the vote for SPIP: PySpark Test Framework.
>>
>> The high-level summary for the SPIP is that it proposes an official test
>> framework for PySpark. Currently, there are only disparate open-source
>> repos and blog posts for PySpark testing resources. We can streamline and
>> simplify the testing process by incorporating test features, such as a
>> PySpark Test Base class (which allows tests to share Spark sessions) and
>> test util functions (for example, asserting dataframe and schema equality).
>>
>> *SPIP doc:*
>> https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v
>>
>> *JIRA ticket:* https://issues.apache.org/jira/browse/SPARK-44042
>>
>> *Discussion thread:*
>> https://lists.apache.org/thread/trwgbgn3ycoj8b8k8lkxko2hql23o41n
>>
>> Please vote on the SPIP for the next 72 hours:
>> [ ] +1: Accept the proposal as an official SPIP
>> [ ] +0
>> [ ] -1: I don’t think this is a good idea because __.
>>
>> Thank you!
>>
>> Best,
>> Amanda Liu
>>
>


Re: [VOTE][SPIP] PySpark Test Framework

2023-06-21 Thread Jacek Laskowski
+0

Pozdrawiam,
Jacek Laskowski

"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on https://twitter.com/jaceklaskowski

<https://twitter.com/jaceklaskowski>


On Wed, Jun 21, 2023 at 5:11 PM Amanda Liu 
wrote:

> Hi all,
>
> I'd like to start the vote for SPIP: PySpark Test Framework.
>
> The high-level summary for the SPIP is that it proposes an official test
> framework for PySpark. Currently, there are only disparate open-source
> repos and blog posts for PySpark testing resources. We can streamline and
> simplify the testing process by incorporating test features, such as a
> PySpark Test Base class (which allows tests to share Spark sessions) and
> test util functions (for example, asserting dataframe and schema equality).
>
> *SPIP doc:*
> https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v
>
> *JIRA ticket:* https://issues.apache.org/jira/browse/SPARK-44042
>
> *Discussion thread:*
> https://lists.apache.org/thread/trwgbgn3ycoj8b8k8lkxko2hql23o41n
>
> Please vote on the SPIP for the next 72 hours:
> [ ] +1: Accept the proposal as an official SPIP
> [ ] +0
> [ ] -1: I don’t think this is a good idea because __.
>
> Thank you!
>
> Best,
> Amanda Liu
>


Re: [VOTE][SPIP] PySpark Test Framework

2023-06-21 Thread Amanda Liu
Yes, let's extend the vote by two days in light of traveling for pride
weekend and conferences.

Best,
Amanda Liu


On Wed, Jun 21, 2023 at 8:41 AM Maciej  wrote:

> +1
>
> --
> Best regards,
> Maciej Szymkiewicz
>
> Web: https://zero323.net
> PGP: A30CEF0C31A501EC
>
>
> On 6/21/23 17:35, Holden Karau wrote:
>
> A small request, it’s pride weekend in San Francisco where some of the
> core developers are and right before one of the larger spark related
> conferences so more folks might be traveling than normal. Could we maybe
> extend the vote out an extra day or two just to give folks a chance to be
> heard?
>
> On Wed, Jun 21, 2023 at 8:30 AM Reynold Xin  wrote:
>
>> +1
>>
>> This is a great idea.
>>
>>
>> On Wed, Jun 21, 2023 at 8:29 AM, Holden Karau 
>> wrote:
>>
>>> I’d like to start with a +1, better Python testing tools integrated into
>>> the project make sense.
>>>
>>> On Wed, Jun 21, 2023 at 8:11 AM Amanda Liu 
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I'd like to start the vote for SPIP: PySpark Test Framework.
>>>>
>>>> The high-level summary for the SPIP is that it proposes an official
>>>> test framework for PySpark. Currently, there are only disparate open-source
>>>> repos and blog posts for PySpark testing resources. We can streamline and
>>>> simplify the testing process by incorporating test features, such as a
>>>> PySpark Test Base class (which allows tests to share Spark sessions) and
>>>> test util functions (for example, asserting dataframe and schema equality).
>>>>
>>>> *SPIP doc:*
>>>> https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v
>>>>
>>>> *JIRA ticket:* https://issues.apache.org/jira/browse/SPARK-44042
>>>>
>>>> *Discussion thread:*
>>>> https://lists.apache.org/thread/trwgbgn3ycoj8b8k8lkxko2hql23o41n
>>>>
>>>> Please vote on the SPIP for the next 72 hours:
>>>> [ ] +1: Accept the proposal as an official SPIP
>>>> [ ] +0
>>>> [ ] -1: I don’t think this is a good idea because __.
>>>>
>>>> Thank you!
>>>>
>>>> Best,
>>>> Amanda Liu
>>>>
>>> --
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>
>> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
>
>


Re: [VOTE][SPIP] PySpark Test Framework

2023-06-21 Thread Maciej

+1

--
Best regards,
Maciej Szymkiewicz

Web:https://zero323.net
PGP: A30CEF0C31A501EC


On 6/21/23 17:35, Holden Karau wrote:
A small request, it’s pride weekend in San Francisco where some of the 
core developers are and right before one of the larger spark related 
conferences so more folks might be traveling than normal. Could we 
maybe extend the vote out an extra day or two just to give folks a 
chance to be heard?


On Wed, Jun 21, 2023 at 8:30 AM Reynold Xin  wrote:

+1

This is a great idea.


On Wed, Jun 21, 2023 at 8:29 AM, Holden Karau
 wrote:

I’d like to start with a +1, better Python testing tools
integrated into the project make sense.

On Wed, Jun 21, 2023 at 8:11 AM Amanda Liu
 wrote:

Hi all,

I'd like to start the vote for SPIP: PySpark Test Framework.

The high-level summary for the SPIP is that it proposes an
official test framework for PySpark. Currently, there are
only disparate open-source repos and blog posts for
PySpark testing resources. We can streamline and simplify
the testing process by incorporating test features, such
as a PySpark Test Base class (which allows tests to share
Spark sessions) and test util functions (for example,
asserting dataframe and schema equality).

*SPIP doc:*

https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v

*JIRA ticket:*
https://issues.apache.org/jira/browse/SPARK-44042

*Discussion thread:*
https://lists.apache.org/thread/trwgbgn3ycoj8b8k8lkxko2hql23o41n

Please vote on the SPIP for the next 72 hours:
[ ] +1: Accept the proposal as an official SPIP
[ ] +0
[ ] -1: I don’t think this is a good idea because __.

Thank you!

Best,
Amanda Liu

-- 
Twitter: https://twitter.com/holdenkarau

Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


--
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.): 
https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9>

YouTube Live Streams: https://www.youtube.com/user/holdenkarau




OpenPGP_signature
Description: OpenPGP digital signature


Re: [VOTE][SPIP] PySpark Test Framework

2023-06-21 Thread Holden Karau
A small request, it’s pride weekend in San Francisco where some of the core
developers are and right before one of the larger spark related conferences
so more folks might be traveling than normal. Could we maybe extend the
vote out an extra day or two just to give folks a chance to be heard?

On Wed, Jun 21, 2023 at 8:30 AM Reynold Xin  wrote:

> +1
>
> This is a great idea.
>
>
> On Wed, Jun 21, 2023 at 8:29 AM, Holden Karau 
> wrote:
>
>> I’d like to start with a +1, better Python testing tools integrated into
>> the project make sense.
>>
>> On Wed, Jun 21, 2023 at 8:11 AM Amanda Liu 
>> wrote:
>>
>>> Hi all,
>>>
>>> I'd like to start the vote for SPIP: PySpark Test Framework.
>>>
>>> The high-level summary for the SPIP is that it proposes an official test
>>> framework for PySpark. Currently, there are only disparate open-source
>>> repos and blog posts for PySpark testing resources. We can streamline and
>>> simplify the testing process by incorporating test features, such as a
>>> PySpark Test Base class (which allows tests to share Spark sessions) and
>>> test util functions (for example, asserting dataframe and schema equality).
>>>
>>> *SPIP doc:*
>>> https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v
>>>
>>> *JIRA ticket:* https://issues.apache.org/jira/browse/SPARK-44042
>>>
>>> *Discussion thread:*
>>> https://lists.apache.org/thread/trwgbgn3ycoj8b8k8lkxko2hql23o41n
>>>
>>> Please vote on the SPIP for the next 72 hours:
>>> [ ] +1: Accept the proposal as an official SPIP
>>> [ ] +0
>>> [ ] -1: I don’t think this is a good idea because __.
>>>
>>> Thank you!
>>>
>>> Best,
>>> Amanda Liu
>>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>
> --
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


Re: [VOTE][SPIP] PySpark Test Framework

2023-06-21 Thread Mich Talebzadeh
+1 for me

Mich Talebzadeh,
Lead Solutions Architect/Engineering Lead
Palantir Technologies Limited
London
United Kingdom


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Wed, 21 Jun 2023 at 16:30, Holden Karau  wrote:

> I’d like to start with a +1, better Python testing tools integrated into
> the project make sense.
>
> On Wed, Jun 21, 2023 at 8:11 AM Amanda Liu 
> wrote:
>
>> Hi all,
>>
>> I'd like to start the vote for SPIP: PySpark Test Framework.
>>
>> The high-level summary for the SPIP is that it proposes an official test
>> framework for PySpark. Currently, there are only disparate open-source
>> repos and blog posts for PySpark testing resources. We can streamline and
>> simplify the testing process by incorporating test features, such as a
>> PySpark Test Base class (which allows tests to share Spark sessions) and
>> test util functions (for example, asserting dataframe and schema equality).
>>
>> *SPIP doc:*
>> https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v
>>
>> *JIRA ticket:* https://issues.apache.org/jira/browse/SPARK-44042
>>
>> *Discussion thread:*
>> https://lists.apache.org/thread/trwgbgn3ycoj8b8k8lkxko2hql23o41n
>>
>> Please vote on the SPIP for the next 72 hours:
>> [ ] +1: Accept the proposal as an official SPIP
>> [ ] +0
>> [ ] -1: I don’t think this is a good idea because __.
>>
>> Thank you!
>>
>> Best,
>> Amanda Liu
>>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


Re: [VOTE][SPIP] PySpark Test Framework

2023-06-21 Thread Reynold Xin
+1

This is a great idea.

On Wed, Jun 21, 2023 at 8:29 AM, Holden Karau < hol...@pigscanfly.ca > wrote:

> 
> I’d like to start with a +1, better Python testing tools integrated into
> the project make sense.
> 
> On Wed, Jun 21, 2023 at 8:11 AM Amanda Liu < amandastephanieliu@ gmail. com
> ( amandastephanie...@gmail.com ) > wrote:
> 
> 
>> Hi all,
>> 
>> I'd like to start the vote for SPIP: PySpark Test Framework.
>> 
>> The high-level summary for the SPIP is that it proposes an official test
>> framework for PySpark. Currently, there are only disparate open-source
>> repos and blog posts for PySpark testing resources. We can streamline and
>> simplify the testing process by incorporating test features, such as a
>> PySpark Test Base class (which allows tests to share Spark sessions) and
>> test util functions (for example, asserting dataframe and schema
>> equality).
>> 
>> *SPIP doc:* https:/ / docs. google. com/ document/ d/ 
>> 1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/
>> edit#heading=h. f5f0u2riv07v (
>> https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v
>> )
>> 
>> 
>> *JIRA ticket:* https:/ / issues. apache. org/ jira/ browse/ SPARK-44042 (
>> https://issues.apache.org/jira/browse/SPARK-44042 )
>> 
>> *Discussion thread:* https:/ / lists. apache. org/ thread/ 
>> trwgbgn3ycoj8b8k8lkxko2hql23o41n
>> ( https://lists.apache.org/thread/trwgbgn3ycoj8b8k8lkxko2hql23o41n )
>> 
>> Please vote on the SPIP for the next 72 hours:
>> [ ] +1: Accept the proposal as an official SPIP
>> [ ] +0
>> [ ] -1: I don’t think this is a good idea because __.
>> 
>> Thank you!
>> 
>> Best,
>> Amanda Liu
>> 
>> 
> 
> --
> Twitter: https:/ / twitter. com/ holdenkarau (
> https://twitter.com/holdenkarau )
> 
> Books (Learning Spark, High Performance Spark, etc.): https:/ / amzn. to/ 
> 2MaRAG9
> ( https://amzn.to/2MaRAG9 )
> YouTube Live Streams: https:/ / www. youtube. com/ user/ holdenkarau (
> https://www.youtube.com/user/holdenkarau )
>

smime.p7s
Description: S/MIME Cryptographic Signature


Re: [VOTE][SPIP] PySpark Test Framework

2023-06-21 Thread Holden Karau
I’d like to start with a +1, better Python testing tools integrated into
the project make sense.

On Wed, Jun 21, 2023 at 8:11 AM Amanda Liu 
wrote:

> Hi all,
>
> I'd like to start the vote for SPIP: PySpark Test Framework.
>
> The high-level summary for the SPIP is that it proposes an official test
> framework for PySpark. Currently, there are only disparate open-source
> repos and blog posts for PySpark testing resources. We can streamline and
> simplify the testing process by incorporating test features, such as a
> PySpark Test Base class (which allows tests to share Spark sessions) and
> test util functions (for example, asserting dataframe and schema equality).
>
> *SPIP doc:*
> https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v
>
> *JIRA ticket:* https://issues.apache.org/jira/browse/SPARK-44042
>
> *Discussion thread:*
> https://lists.apache.org/thread/trwgbgn3ycoj8b8k8lkxko2hql23o41n
>
> Please vote on the SPIP for the next 72 hours:
> [ ] +1: Accept the proposal as an official SPIP
> [ ] +0
> [ ] -1: I don’t think this is a good idea because __.
>
> Thank you!
>
> Best,
> Amanda Liu
>
-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


[VOTE][SPIP] PySpark Test Framework

2023-06-21 Thread Amanda Liu
Hi all,

I'd like to start the vote for SPIP: PySpark Test Framework.

The high-level summary for the SPIP is that it proposes an official test
framework for PySpark. Currently, there are only disparate open-source
repos and blog posts for PySpark testing resources. We can streamline and
simplify the testing process by incorporating test features, such as a
PySpark Test Base class (which allows tests to share Spark sessions) and
test util functions (for example, asserting dataframe and schema equality).

*SPIP doc:*
https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v

*JIRA ticket:* https://issues.apache.org/jira/browse/SPARK-44042

*Discussion thread:*
https://lists.apache.org/thread/trwgbgn3ycoj8b8k8lkxko2hql23o41n

Please vote on the SPIP for the next 72 hours:
[ ] +1: Accept the proposal as an official SPIP
[ ] +0
[ ] -1: I don’t think this is a good idea because __.

Thank you!

Best,
Amanda Liu


Re: [DISCUSS] SPIP: Add PySpark Test Framework

2023-06-15 Thread Mich Talebzadeh
+1  for me.

The SPIP document is well written as well.

HTH

Mich Talebzadeh,
Lead Solutions Architect/Engineering Lead
Palantir Technologies Limited
London
United Kingdom


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Wed, 14 Jun 2023 at 00:10, Amanda Liu 
wrote:

> Hi all,
>
> I'd like to start a discussion about implementing an official PySpark test
> framework. Currently, there's no official test framework, but only various
> open-source repos and blog posts.
>
> Many of these open-source resources are very popular, which demonstrates
> user-demand for PySpark testing capabilities. spark-testing-base
> <https://github.com/holdenk/spark-testing-base> has 1.4k stars, and chispa
> <https://github.com/MrPowers/chispa> has 532k downloads/month. However,
> it can be confusing for users to piece together disparate resources to
> write their own PySpark tests (see The Elephant in the Room: How to Write
> PySpark Tests
> <https://towardsdatascience.com/the-elephant-in-the-room-how-to-write-pyspark-unit-tests-a5073acabc34>
> ).
>
> We can streamline and simplify the testing process by incorporating test
> features, such as a PySpark Test Base class (which allows tests to share
> Spark sessions) and test util functions (for example, asserting dataframe
> and schema equality).
>
> Please see the SPIP document attached:
> https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07vAnd
> the JIRA ticket: https://issues.apache.org/jira/browse/SPARK-44042
>
> I would appreciate it if you could share your thoughts on this proposal.
>
> Thank you!
> Amanda Liu
>


Re: [DISCUSS] SPIP: Add PySpark Test Framework

2023-06-14 Thread Ruifeng Zheng
+1 from my side

sounds good, it will be helpful to both users and contributors to improve
the test coverage

On Wed, Jun 14, 2023 at 8:27 AM Hyukjin Kwon  wrote:

> Yeah, I have been thinking about this too, and Holden did some work here
> that this SPIP will reuse. I support this.
>
> On Wed, 14 Jun 2023 at 08:10, Amanda Liu 
> wrote:
>
>> Hi all,
>>
>> I'd like to start a discussion about implementing an official PySpark
>> test framework. Currently, there's no official test framework, but only
>> various open-source repos and blog posts.
>>
>> Many of these open-source resources are very popular, which demonstrates
>> user-demand for PySpark testing capabilities. spark-testing-base
>> <https://github.com/holdenk/spark-testing-base> has 1.4k stars, and
>> chispa <https://github.com/MrPowers/chispa> has 532k downloads/month.
>> However, it can be confusing for users to piece together disparate
>> resources to write their own PySpark tests (see The Elephant in the
>> Room: How to Write PySpark Tests
>> <https://towardsdatascience.com/the-elephant-in-the-room-how-to-write-pyspark-unit-tests-a5073acabc34>
>> ).
>>
>> We can streamline and simplify the testing process by incorporating test
>> features, such as a PySpark Test Base class (which allows tests to share
>> Spark sessions) and test util functions (for example, asserting dataframe
>> and schema equality).
>>
>> Please see the SPIP document attached:
>> https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07vAnd
>> the JIRA ticket: https://issues.apache.org/jira/browse/SPARK-44042
>>
>> I would appreciate it if you could share your thoughts on this proposal.
>>
>> Thank you!
>> Amanda Liu
>>
>


Re: [DISCUSS] SPIP: Add PySpark Test Framework

2023-06-13 Thread Hyukjin Kwon
Yeah, I have been thinking about this too, and Holden did some work here
that this SPIP will reuse. I support this.

On Wed, 14 Jun 2023 at 08:10, Amanda Liu 
wrote:

> Hi all,
>
> I'd like to start a discussion about implementing an official PySpark test
> framework. Currently, there's no official test framework, but only various
> open-source repos and blog posts.
>
> Many of these open-source resources are very popular, which demonstrates
> user-demand for PySpark testing capabilities. spark-testing-base
> <https://github.com/holdenk/spark-testing-base> has 1.4k stars, and chispa
> <https://github.com/MrPowers/chispa> has 532k downloads/month. However,
> it can be confusing for users to piece together disparate resources to
> write their own PySpark tests (see The Elephant in the Room: How to Write
> PySpark Tests
> <https://towardsdatascience.com/the-elephant-in-the-room-how-to-write-pyspark-unit-tests-a5073acabc34>
> ).
>
> We can streamline and simplify the testing process by incorporating test
> features, such as a PySpark Test Base class (which allows tests to share
> Spark sessions) and test util functions (for example, asserting dataframe
> and schema equality).
>
> Please see the SPIP document attached:
> https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07vAnd
> the JIRA ticket: https://issues.apache.org/jira/browse/SPARK-44042
>
> I would appreciate it if you could share your thoughts on this proposal.
>
> Thank you!
> Amanda Liu
>


[DISCUSS] SPIP: Add PySpark Test Framework

2023-06-13 Thread Amanda Liu
Hi all,

I'd like to start a discussion about implementing an official PySpark test
framework. Currently, there's no official test framework, but only various
open-source repos and blog posts.

Many of these open-source resources are very popular, which demonstrates
user-demand for PySpark testing capabilities. spark-testing-base
<https://github.com/holdenk/spark-testing-base> has 1.4k stars, and chispa
<https://github.com/MrPowers/chispa> has 532k downloads/month. However, it
can be confusing for users to piece together disparate resources to write
their own PySpark tests (see The Elephant in the Room: How to Write PySpark
Tests
<https://towardsdatascience.com/the-elephant-in-the-room-how-to-write-pyspark-unit-tests-a5073acabc34>
).

We can streamline and simplify the testing process by incorporating test
features, such as a PySpark Test Base class (which allows tests to share
Spark sessions) and test util functions (for example, asserting dataframe
and schema equality).

Please see the SPIP document attached:
https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07vAnd
the JIRA ticket: https://issues.apache.org/jira/browse/SPARK-44042

I would appreciate it if you could share your thoughts on this proposal.

Thank you!
Amanda Liu


Observed consistent test failure in master (ParquetIOSuite)

2022-06-27 Thread Jungtaek Lim
Hi,

I just observed the test failure in ParquetIOSuite which I can consistently
reproduce with IntelliJ. Haven't had a chance to run a test with maven/sbt.

I filed SPARK-39622 <https://issues.apache.org/jira/browse/SPARK-39622> for
this failure.

It'd be awesome if someone having context looks into this sooner.

Thanks!
Jungtaek Lim (HeartSaVioR)


Maven Test blocks with TransportCipherSuite

2022-05-20 Thread Qian SUN
Hi, team.

I run the maven command to run unit test, and have a NPE.

command: ./build/mvn test
refer to
https://spark.apache.org/docs/latest/building-spark.html#running-tests

NPE is as follow:
22/05/20 16:32:45.450 main WARN AbstractChannelHandlerContext: Failed to
mark a promise as failure because it has succeeded already:
DefaultChannelPromise@366ef90e(success)
java.lang.NullPointerException: null
at
org.apache.spark.network.crypto.TransportCipher$EncryptionHandler.close(TransportCipher.java:137)
~[classes/:?]
at
io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:622)
~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
at
io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:606)
~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
at
io.netty.channel.DefaultChannelPipeline.close(DefaultChannelPipeline.java:994)
~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
at io.netty.channel.AbstractChannel.close(AbstractChannel.java:280)
~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
at
io.netty.channel.embedded.EmbeddedChannel.close(EmbeddedChannel.java:568)
~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
at
io.netty.channel.embedded.EmbeddedChannel.close(EmbeddedChannel.java:555)
~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
at
io.netty.channel.embedded.EmbeddedChannel.finish(EmbeddedChannel.java:503)
~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
at
io.netty.channel.embedded.EmbeddedChannel.finish(EmbeddedChannel.java:483)
~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
at
org.apache.spark.network.crypto.TransportCipherSuite.testBufferNotLeaksOnInternalError(TransportCipherSuite.java:78)
~[test-classes/:?]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
~[?:1.8.0_291]
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
~[?:1.8.0_291]
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[?:1.8.0_291]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_291]
at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
~[junit-4.13.2.jar:4.13.2]
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
~[junit-4.13.2.jar:4.13.2]
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
~[junit-4.13.2.jar:4.13.2]
at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
~[junit-4.13.2.jar:4.13.2]
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
~[junit-4.13.2.jar:4.13.2]
at
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
~[junit-4.13.2.jar:4.13.2]
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
~[junit-4.13.2.jar:4.13.2]
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
~[junit-4.13.2.jar:4.13.2]
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
~[junit-4.13.2.jar:4.13.2]
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
~[junit-4.13.2.jar:4.13.2]
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
~[junit-4.13.2.jar:4.13.2]
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
~[junit-4.13.2.jar:4.13.2]
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
~[junit-4.13.2.jar:4.13.2]
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
~[junit-4.13.2.jar:4.13.2]
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
~[junit-4.13.2.jar:4.13.2]
at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
~[junit-4.13.2.jar:4.13.2]
at
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:364)
~[surefire-junit4-3.0.0-M5.jar:3.0.0-M5]
at
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:272)
~[surefire-junit4-3.0.0-M5.jar:3.0.0-M5]
at
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:237)
~[surefire-junit4-3.0.0-M5.jar:3.0.0-M5]
at
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:158)
~[surefire-junit4-3.0.0-M5.jar:3.0.0-M5]
at
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:428)
~[surefire-booter-3.0.0-M5.jar:3.0.0-M5]
at
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:162)
~[surefire-booter-3.0.0-M5.jar:3.0.0-M5]
at org.apache.maven.surefire.booter.ForkedBooter.run(ForkedBooter.java:562)
~[surefire-booter-3.0.0-M5.jar:3.0.0-M5]
at
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:548)
~[surefire-booter-3.0.0-M5.jar:3.0.0-M5]


Anyone with same exception?

-- 
Best!
Qian SUN


Re: Skip single integration test case in Spark on K8s

2022-03-16 Thread Dongjoon Hyun
-user@spark

For cloud backend, you need to exclude minikube specific tests and
local-only test (SparkRemoteFileTest).

-Dtest.exclude.tags=minikube,local

You can find more options including SBT commands here.


https://github.com/apache/spark/tree/master/resource-managers/kubernetes/integration-tests

Dongjoon.


On Wed, Mar 16, 2022 at 6:11 AM Pralabh Kumar 
wrote:

> Hi Spark team
>
> I am running Spark kubernetes integration test suite on cloud.
>
> build/mvn install \
>
> -f  pom.xml \
>
> -pl resource-managers/kubernetes/integration-tests -am -Pscala-2.12
> -Phadoop-3.1.1 -Phive -Phive-thriftserver -Pyarn -Pkubernetes
> -Pkubernetes-integration-tests \
>
> -Djava.version=8 \
>
> -Dspark.kubernetes.test.sparkTgz= \
>
> -Dspark.kubernetes.test.imageTag=<> \
>
> -Dspark.kubernetes.test.imageRepo=< <http://reg.visa.com/>repo> \
>
> -Dspark.kubernetes.test.deployMode=cloud \
>
> -Dtest.include.tags=k8s \
>
> -Dspark.kubernetes.test.javaImageTag= \
>
> -Dspark.kubernetes.test.namespace= \
>
> -Dspark.kubernetes.test.serviceAccountName=spark \
>
> -Dspark.kubernetes.test.kubeConfigContext=<> \
>
> -Dspark.kubernetes.test.master=<> \
>
> -Dspark.kubernetes.test.jvmImage=<> \
>
> -Dspark.kubernetes.test.pythonImage=<> \
>
> -Dlog4j.logger.org.apache.spark=DEBUG
>
>
>
> I am successfully able to run some test cases and some are failing . For
> e.g "Run SparkRemoteFileTest using a Remote data file" in KuberneterSuite
> is failing.
>
>
> Is there a way to skip running some of the test cases ?.
>
>
>
> Please help me on the same.
>
>
> Regards
>
> Pralabh Kumar
>


Skip single integration test case in Spark on K8s

2022-03-16 Thread Pralabh Kumar
Hi Spark team

I am running Spark kubernetes integration test suite on cloud.

build/mvn install \

-f  pom.xml \

-pl resource-managers/kubernetes/integration-tests -am -Pscala-2.12
-Phadoop-3.1.1 -Phive -Phive-thriftserver -Pyarn -Pkubernetes
-Pkubernetes-integration-tests \

-Djava.version=8 \

-Dspark.kubernetes.test.sparkTgz= \

-Dspark.kubernetes.test.imageTag=<> \

-Dspark.kubernetes.test.imageRepo=< <http://reg.visa.com/>repo> \

-Dspark.kubernetes.test.deployMode=cloud \

-Dtest.include.tags=k8s \

-Dspark.kubernetes.test.javaImageTag= \

-Dspark.kubernetes.test.namespace= \

-Dspark.kubernetes.test.serviceAccountName=spark \

-Dspark.kubernetes.test.kubeConfigContext=<> \

-Dspark.kubernetes.test.master=<> \

-Dspark.kubernetes.test.jvmImage=<> \

-Dspark.kubernetes.test.pythonImage=<> \

-Dlog4j.logger.org.apache.spark=DEBUG



I am successfully able to run some test cases and some are failing . For
e.g "Run SparkRemoteFileTest using a Remote data file" in KuberneterSuite
is failing.


Is there a way to skip running some of the test cases ?.



Please help me on the same.


Regards

Pralabh Kumar


Re: [How To] run test suites for specific module

2022-01-24 Thread Qian SUN
Hi Shen

You can use sbt to run a specific suite.

1. run sbt shell.
   $ bash build/sbt
2. specify project.
   sbt > project core
 You can get project name from properties `sbt.project.name` from
pom.xml
3. Finally, you can run a specific suite
   sbt > testOnly org.apache.spark.scheduler.DAGSchedulerSuite

Hope this helps
Best regards,
Qian Sun

Fangjia Shen  于2022年1月25日周二 07:44写道:

> Hello all,
>
> How do you run Spark's test suites when you want to test the correctness
> of your code? Is there a way to run a specific test suite for Spark? For
> example, running test suite XXXSuite alone, instead of every class under
> the test/ directories.
>
> Here's some background info about what I want to do: I'm a graduate
> student trying to study Spark's design and find ways to improve Spark's
> performance by doing Software/Hardware co-design. I'm relatively new to
> Maven and so far struggling to find to a way to properly run Spark's own
> test suites.
>
> Let's say I did some modifications to a XXXExec node which belongs to the
> org.apache.spark.sql package. I want to see if my design passes the test
> cases. What should I do?
>
>
> What command should I use:
>
>  */build/mvn test *  or  */dev/run-tests*  ?
>
> And where should I run that command:
>
> **  or  ** ? - where  is where
> the modified scala file is located, e.g. "/sql/core/".
>
>
> I tried adding -Dtest=XXXSuite to *mvn test *but still get to run tens of
> thousands of tests. This is taking way too much time and unbearable if I'm
> just modifying a few file in a specific module.
>
> I would really appreciate any suggestion or comment.
>
>
> Best regards,
>
> Fangjia Shen
>
> Purdue University
>
>
>
>

-- 
Best!
Qian SUN


Re: [How To] run test suites for specific module

2022-01-24 Thread Maciej
Hi,

Please check the relevant section of the developer tools docs:

https://spark.apache.org/developer-tools.html#running-individual-tests

On 1/25/22 00:44, Fangjia Shen wrote:
> Hello all,
> 
> How do you run Spark's test suites when you want to test the correctness
> of your code? Is there a way to run a specific test suite for Spark? For
> example, running test suite XXXSuite alone, instead of every class under
> the test/ directories.
> 
> Here's some background info about what I want to do: I'm a graduate
> student trying to study Spark's design and find ways to improve Spark's
> performance by doing Software/Hardware co-design. I'm relatively new to
> Maven and so far struggling to find to a way to properly run Spark's own
> test suites.
> 
> Let's say I did some modifications to a XXXExec node which belongs to
> the org.apache.spark.sql package. I want to see if my design passes the
> test cases. What should I do?
> 
> 
> What command should I use:
> 
>  */build/mvn test *  or  */dev/run-tests*  ?
> 
> And where should I run that command:
> 
>     **  or  ** ? - where  is where
> the modified scala file is located, e.g. "/sql/core/".
> 
> 
> I tried adding -Dtest=XXXSuite to *mvn test *but still get to run tens
> of thousands of tests. This is taking way too much time and unbearable
> if I'm just modifying a few file in a specific module.
> 
> I would really appreciate any suggestion or comment.
> 
> 
> Best regards,
> 
> Fangjia Shen
> 
> Purdue University
> 
> 
> 


-- 
Best regards,
Maciej Szymkiewicz

Web: https://zero323.net
PGP: A30CEF0C31A501EC


OpenPGP_signature
Description: OpenPGP digital signature


[How To] run test suites for specific module

2022-01-24 Thread Fangjia Shen

Hello all,

How do you run Spark's test suites when you want to test the correctness 
of your code? Is there a way to run a specific test suite for Spark? For 
example, running test suite XXXSuite alone, instead of every class under 
the test/ directories.


Here's some background info about what I want to do: I'm a graduate 
student trying to study Spark's design and find ways to improve Spark's 
performance by doing Software/Hardware co-design. I'm relatively new to 
Maven and so far struggling to find to a way to properly run Spark's own 
test suites.


Let's say I did some modifications to a XXXExec node which belongs to 
the org.apache.spark.sql package. I want to see if my design passes the 
test cases. What should I do?



What command should I use:

*/build/mvn test *  or */dev/run-tests* ?

And where should I run that command:

**  or ** ? - where  is where the 
modified scala file is located, e.g. "/sql/core/".



I tried adding -Dtest=XXXSuite to *mvn test *but still get to run tens 
of thousands of tests. This is taking way too much time and unbearable 
if I'm just modifying a few file in a specific module.


I would really appreciate any suggestion or comment.


Best regards,

Fangjia Shen

Purdue University




Re: ivy unit test case filing for Spark

2021-12-21 Thread Sean Owen
You would have to make it available? This doesn't seem like a spark issue.

On Tue, Dec 21, 2021, 10:48 AM Pralabh Kumar  wrote:

> Hi Spark Team
>
> I am building a spark in VPN . But the unit test case below is failing.
> This is pointing to ivy location which  cannot be reached within VPN . Any
> help would be appreciated
>
> test("SPARK-33084: Add jar support Ivy URI -- default transitive = true")
> {
>   *sc *= new SparkContext(new 
> SparkConf().setAppName("test").setMaster("local-cluster[3,
> 1, 1024]"))
>   *sc*.addJar("*ivy://org.apache.hive:hive-storage-api:2.7.0*")
>   assert(*sc*.listJars().exists(_.contains(
> "org.apache.hive_hive-storage-api-2.7.0.jar")))
>   assert(*sc*.listJars().exists(_.contains(
> "commons-lang_commons-lang-2.6.jar")))
> }
>
> Error
>
> - SPARK-33084: Add jar support Ivy URI -- default transitive = true ***
> FAILED ***
> java.lang.RuntimeException: [unresolved dependency:
> org.apache.hive#hive-storage-api;2.7.0: not found]
> at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(
> SparkSubmit.scala:1447)
> at org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(
> DependencyUtils.scala:185)
> at org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(
> DependencyUtils.scala:159)
> at org.apache.spark.SparkContext.addJar(SparkContext.scala:1996)
> at org.apache.spark.SparkContext.addJar(SparkContext.scala:1928)
> at org.apache.spark.SparkContextSuite.$anonfun$new$115(SparkContextSuite.
> scala:1041)
> at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> at org.scalatest.Transformer.apply(Transformer.scala:22)
>
> Regards
> Pralabh Kumar
>
>
>


ivy unit test case filing for Spark

2021-12-21 Thread Pralabh Kumar
Hi Spark Team

I am building a spark in VPN . But the unit test case below is failing.
This is pointing to ivy location which  cannot be reached within VPN . Any
help would be appreciated

test("SPARK-33084: Add jar support Ivy URI -- default transitive = true") {
  *sc *= new SparkContext(new
SparkConf().setAppName("test").setMaster("local-cluster[3,
1, 1024]"))
  *sc*.addJar("*ivy://org.apache.hive:hive-storage-api:2.7.0*")
  assert(*sc*.listJars().exists(_.contains(
"org.apache.hive_hive-storage-api-2.7.0.jar")))
  assert(*sc*.listJars().exists(_.contains(
"commons-lang_commons-lang-2.6.jar")))
}

Error

- SPARK-33084: Add jar support Ivy URI -- default transitive = true ***
FAILED ***
java.lang.RuntimeException: [unresolved dependency:
org.apache.hive#hive-storage-api;2.7.0: not found]
at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(
SparkSubmit.scala:1447)
at org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(
DependencyUtils.scala:185)
at org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(
DependencyUtils.scala:159)
at org.apache.spark.SparkContext.addJar(SparkContext.scala:1996)
at org.apache.spark.SparkContext.addJar(SparkContext.scala:1928)
at org.apache.spark.SparkContextSuite.$anonfun$new$115(SparkContextSuite.
scala:1041)
at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)

Regards
Pralabh Kumar


Re: [PSA] Please read: PR builder now runs test and build in your forked repository

2021-04-14 Thread Hyukjin Kwon
I remember it's turned on by default (?). If not, yeah we should document.

2021년 4월 15일 (목) 오후 1:14, Kent Yao 님이 작성:

> Thanks Hyukjin and Yikun,
>
> > 2. New Forks have to turn on GitHub action by the fork owner manually
>
> And we may still need a suitable place to make this note clearer to new
> contributors or someone delete and re-fork their forked repo.
>
> Thanks
>
>
> *Kent Yao *
> @ Data Science Center, Hangzhou Research Institute, NetEase Corp.
> *a spark enthusiast*
> *kyuubi <https://github.com/yaooqinn/kyuubi>is a unified multi-tenant JDBC
> interface for large-scale data processing and analytics, built on top
> of Apache Spark <http://spark.apache.org/>.*
> *spark-authorizer <https://github.com/yaooqinn/spark-authorizer>A Spark
> SQL extension which provides SQL Standard Authorization for **Apache
> Spark <http://spark.apache.org/>.*
> *spark-postgres <https://github.com/yaooqinn/spark-postgres> A library for
> reading data from and transferring data to Postgres / Greenplum with Spark
> SQL and DataFrames, 10~100x faster.*
> *itatchi <https://github.com/yaooqinn/spark-func-extras>A** library t**hat
> brings useful functions from various modern database management systems to 
> **Apache
> Spark <http://spark.apache.org/>.*
>
>
>
> On 04/15/2021 12:09,Hyukjin Kwon
>  wrote:
>
> The issue is fixed now. Please keep monitoring this. Thank you all! The
> spark community is super active and cooperative!
>
> 2021년 4월 15일 (목) 오전 11:01, Hyukjin Kwon 님이 작성:
>
>> The fix will be straightforward. We can either, in Github Actions
>> workflow,:
>> - remove fast forward option and see if ti works
>> - or git rebase before merge the branch
>>
>> 2021년 4월 15일 (목) 오전 11:00, Hyukjin Kwon 님이 작성:
>>
>>> I think it works mostly correctly as Dongjoon investigated and shared
>>> (Thanks a lot!).
>>> One problem seems to be syncing to the master seems too strict (
>>> https://github.com/apache/spark/pull/32168#issuecomment-819736508).
>>> Thanks Yikun.
>>> I think we should make it less strict. I can create a PR right away but
>>> would like to encourage Yikun or Kent to do it in order to keep the credits
>>> of their investigation.
>>>
>>> 2021년 4월 15일 (목) 오전 7:21, Dongjoon Hyun 님이 작성:
>>>
>>>> Hi, Kent.
>>>>
>>>> I checked (1) in your PR, but those test result comments look correct
>>>> to me.
>>>> Please note that both Jenkins and GitHub Action leave the same number
>>>> of comments on the same GitHash.
>>>> Given that, there are not fake comments. It looks like a real result of
>>>> your commits on that PR.
>>>>
>>>> GitHash: 23248c3
>>>>  https://github.com/apache/spark/pull/32144#issuecomment-819679970
>>>> (GitHub Action)
>>>>  https://github.com/apache/spark/pull/32144#issuecomment-819647368
>>>> (Jenkins)
>>>>
>>>> GitHash: 8dbed7b
>>>> https://github.com/apache/spark/pull/32144#issuecomment-819684782
>>>> (GitHub Action)
>>>> https://github.com/apache/spark/pull/32144#issuecomment-819578976
>>>> (Jenkins)
>>>>
>>>> GitHash: a3a6c5e
>>>> https://github.com/apache/spark/pull/32144#issuecomment-819690465
>>>> (GitHub Action)
>>>> https://github.com/apache/spark/pull/32144#issuecomment-819793557
>>>> (Jenkins)
>>>>
>>>> GitHash: b6d26b7
>>>> https://github.com/apache/spark/pull/32144#issuecomment-819691416
>>>> (GitHub Action)
>>>> https://github.com/apache/spark/pull/32144#issuecomment-819791485
>>>> (Jenkins)
>>>>
>>>> Could you recheck it?
>>>>
>>>>
>>>> 1. Github-actions notification could be wrong when another PR opened
>>>>> with some same commits, and you will get a lot of fake comments then.
>>>>> Meanwhile, the new PR get no comments, even if it is actually the
>>>>> chosen one.
>>>>>1.1
>>>>> https://github.com/apache/spark/pull/32144#issuecomment-819679970
>>>>>
>>>>
>>>>
>>>> On Wed, Apr 14, 2021 at 10:41 AM Kent Yao  wrote:
>>>>
>>>>> Hi ALL, here is something I notice after this change:
>>>>>
>>>>> 1. Github-actions notification could be wrong when another PR opened
>>>>> with some same commits, and you will get a lot

Re: [PSA] Please read: PR builder now runs test and build in your forked repository

2021-04-14 Thread Kent Yao







Thanks Hyukjin and Yikun,> 2. New Forks have to turn on GitHub action by the fork owner manuallyAnd we may still need a suitable place to make this note clearer to new contributors or someone delete and re-fork their forked repo.Thanks









  





















Kent Yao @ Data Science Center, Hangzhou Research Institute, NetEase Corp.a spark enthusiastkyuubiis a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark.spark-authorizerA Spark SQL extension which provides SQL Standard Authorization for Apache Spark.spark-postgres A library for reading data from and transferring data to Postgres / Greenplum with Spark SQL and DataFrames, 10~100x faster.itatchiA library that brings useful functions from various modern database management systems to Apache Spark.
















 


On 04/15/2021 12:09,Hyukjin Kwon wrote: 


The issue is fixed now. Please keep monitoring this. Thank you all! The spark community is super active and cooperative!2021년 4월 15일 (목) 오전 11:01, Hyukjin Kwon <gurwls...@gmail.com>님이 작성:The fix will be straightforward. We can either, in Github Actions workflow,:- remove fast forward option and see if ti works- or git rebase before merge the branch2021년 4월 15일 (목) 오전 11:00, Hyukjin Kwon <gurwls...@gmail.com>님이 작성:I think it works mostly correctly as Dongjoon investigated and shared (Thanks a lot!).One problem seems to be syncing to the master seems too strict (https://github.com/apache/spark/pull/32168#issuecomment-819736508). Thanks Yikun. I think we should make it less strict. I can create a PR right away but would like to encourage Yikun or Kent to do it in order to keep the credits of their investigation.2021년 4월 15일 (목) 오전 7:21, Dongjoon Hyun <dongjoon.h...@gmail.com>님이 작성:Hi, Kent.I checked (1) in your PR, but those test result comments look correct to me.Please note that both Jenkins and GitHub Action leave the same number of comments on the same GitHash.Given that, there are not fake comments. It looks like a real result of your commits on that PR.GitHash: 23248c3     https://github.com/apache/spark/pull/32144#issuecomment-819679970 (GitHub Action)     https://github.com/apache/spark/pull/32144#issuecomment-819647368 (Jenkins)GitHash: 8dbed7b    https://github.com/apache/spark/pull/32144#issuecomment-819684782 (GitHub Action)    https://github.com/apache/spark/pull/32144#issuecomment-819578976 (Jenkins)GitHash: a3a6c5e    https://github.com/apache/spark/pull/32144#issuecomment-819690465 (GitHub Action)    https://github.com/apache/spark/pull/32144#issuecomment-819793557 (Jenkins)GitHash: b6d26b7    https://github.com/apache/spark/pull/32144#issuecomment-819691416 (GitHub Action)    https://github.com/apache/spark/pull/32144#issuecomment-819791485 (Jenkins)Could you recheck it?1. Github-actions notification could be wrong when another PR openedwith some same commits, and you will get a lot of fake comments then.Meanwhile, the new PR get no comments, even if it is actually thechosen one.   1.1 https://github.com/apache/spark/pull/32144#issuecomment-819679970On Wed, Apr 14, 2021 at 10:41 AM Kent Yao <yaooq...@gmail.com> wrote:Hi ALL, here is something I notice after this change:

1. Github-actions notification could be wrong when another PR opened
with some same commits, and you will get a lot of fake comments then.
Meanwhile, the new PR get no comments, even if it is actually the
chosen one.
   1.1 https://github.com/apache/spark/pull/32144#issuecomment-819679970
2. New Forks have to turn on GitHub action by the fork owner manually
3. `Notify test workflow` keeps waiting when the build flow canceled
or the whole fork gone
4. After refreshed master or even re-forked :(, I still got failures
and seems not alone
   4.1. https://github.com/apache/spark/pull/32168 (PR after sync)
   4.2. https://github.com/apache/spark/pull/32172 (PR after re-forked)
   4.3. https://github.com/attilapiros/spark/runs/2344911058?check_suite_focus=true
(some other failures noticed)


Bests,

Kent

Dongjoon Hyun <dongjoon.h...@gmail.com> 于2021年4月14日周三 下午11:34写道:
>
> Thank you again, Hyukjin.
>
> Bests,
> Dongjoon.
>
> On Wed, Apr 14, 2021 at 5:25 AM Kent Yao <yaooq...@gmail.com> wrote:
>>
>> Cool, thanks!
>>
>> Hyukjin Kwon <gurwls...@gmail.com> 于2021年4月14日周三 下午8:19写道:
>>>
>>> Good point! I had to clarify.
>>> Once is enough. The sync is needed for your branch to include the changes of https://github.com/apache/spark/pull/32092.
>>>
>>>
>>> 2021년 4월 14일 (수) 오후 9:11, Kent Yao <yaooq...@gmail.com>님이 작성:
>>>>
>>>> Hi Hyukjin,
>>>>
>>>> > Please sync your branch to the la

Re: [PSA] Please read: PR builder now runs test and build in your forked repository

2021-04-14 Thread Hyukjin Kwon
The issue is fixed now. Please keep monitoring this. Thank you all! The
spark community is super active and cooperative!

2021년 4월 15일 (목) 오전 11:01, Hyukjin Kwon 님이 작성:

> The fix will be straightforward. We can either, in Github Actions
> workflow,:
> - remove fast forward option and see if ti works
> - or git rebase before merge the branch
>
> 2021년 4월 15일 (목) 오전 11:00, Hyukjin Kwon 님이 작성:
>
>> I think it works mostly correctly as Dongjoon investigated and shared
>> (Thanks a lot!).
>> One problem seems to be syncing to the master seems too strict (
>> https://github.com/apache/spark/pull/32168#issuecomment-819736508).
>> Thanks Yikun.
>> I think we should make it less strict. I can create a PR right away but
>> would like to encourage Yikun or Kent to do it in order to keep the credits
>> of their investigation.
>>
>> 2021년 4월 15일 (목) 오전 7:21, Dongjoon Hyun 님이 작성:
>>
>>> Hi, Kent.
>>>
>>> I checked (1) in your PR, but those test result comments look correct to
>>> me.
>>> Please note that both Jenkins and GitHub Action leave the same number of
>>> comments on the same GitHash.
>>> Given that, there are not fake comments. It looks like a real result of
>>> your commits on that PR.
>>>
>>> GitHash: 23248c3
>>>  https://github.com/apache/spark/pull/32144#issuecomment-819679970
>>> (GitHub Action)
>>>  https://github.com/apache/spark/pull/32144#issuecomment-819647368
>>> (Jenkins)
>>>
>>> GitHash: 8dbed7b
>>> https://github.com/apache/spark/pull/32144#issuecomment-819684782
>>> (GitHub Action)
>>> https://github.com/apache/spark/pull/32144#issuecomment-819578976
>>> (Jenkins)
>>>
>>> GitHash: a3a6c5e
>>> https://github.com/apache/spark/pull/32144#issuecomment-819690465
>>> (GitHub Action)
>>> https://github.com/apache/spark/pull/32144#issuecomment-819793557
>>> (Jenkins)
>>>
>>> GitHash: b6d26b7
>>> https://github.com/apache/spark/pull/32144#issuecomment-819691416
>>> (GitHub Action)
>>> https://github.com/apache/spark/pull/32144#issuecomment-819791485
>>> (Jenkins)
>>>
>>> Could you recheck it?
>>>
>>>
>>> 1. Github-actions notification could be wrong when another PR opened
>>>> with some same commits, and you will get a lot of fake comments then.
>>>> Meanwhile, the new PR get no comments, even if it is actually the
>>>> chosen one.
>>>>1.1
>>>> https://github.com/apache/spark/pull/32144#issuecomment-819679970
>>>>
>>>
>>>
>>> On Wed, Apr 14, 2021 at 10:41 AM Kent Yao  wrote:
>>>
>>>> Hi ALL, here is something I notice after this change:
>>>>
>>>> 1. Github-actions notification could be wrong when another PR opened
>>>> with some same commits, and you will get a lot of fake comments then.
>>>> Meanwhile, the new PR get no comments, even if it is actually the
>>>> chosen one.
>>>>1.1
>>>> https://github.com/apache/spark/pull/32144#issuecomment-819679970
>>>> 2. New Forks have to turn on GitHub action by the fork owner manually
>>>> 3. `Notify test workflow` keeps waiting when the build flow canceled
>>>> or the whole fork gone
>>>> 4. After refreshed master or even re-forked :(, I still got failures
>>>> and seems not alone
>>>>4.1. https://github.com/apache/spark/pull/32168 (PR after sync)
>>>>4.2. https://github.com/apache/spark/pull/32172 (PR after re-forked)
>>>>4.3.
>>>> https://github.com/attilapiros/spark/runs/2344911058?check_suite_focus=true
>>>> (some other failures noticed)
>>>>
>>>>
>>>> Bests,
>>>>
>>>> Kent
>>>>
>>>> Dongjoon Hyun  于2021年4月14日周三 下午11:34写道:
>>>> >
>>>> > Thank you again, Hyukjin.
>>>> >
>>>> > Bests,
>>>> > Dongjoon.
>>>> >
>>>> > On Wed, Apr 14, 2021 at 5:25 AM Kent Yao  wrote:
>>>> >>
>>>> >> Cool, thanks!
>>>> >>
>>>> >> Hyukjin Kwon  于2021年4月14日周三 下午8:19写道:
>>>> >>>
>>>> >>> Good point! I had to clarify.
>>>> >>> Once is enough. The sync is needed for your branch to include the
>>>> changes of https://github.com/apache/spark/pull/32092.
>>>

Re: [PSA] Please read: PR builder now runs test and build in your forked repository

2021-04-14 Thread Hyukjin Kwon
The fix will be straightforward. We can either, in Github Actions workflow,:
- remove fast forward option and see if ti works
- or git rebase before merge the branch

2021년 4월 15일 (목) 오전 11:00, Hyukjin Kwon 님이 작성:

> I think it works mostly correctly as Dongjoon investigated and shared
> (Thanks a lot!).
> One problem seems to be syncing to the master seems too strict (
> https://github.com/apache/spark/pull/32168#issuecomment-819736508).
> Thanks Yikun.
> I think we should make it less strict. I can create a PR right away but
> would like to encourage Yikun or Kent to do it in order to keep the credits
> of their investigation.
>
> 2021년 4월 15일 (목) 오전 7:21, Dongjoon Hyun 님이 작성:
>
>> Hi, Kent.
>>
>> I checked (1) in your PR, but those test result comments look correct to
>> me.
>> Please note that both Jenkins and GitHub Action leave the same number of
>> comments on the same GitHash.
>> Given that, there are not fake comments. It looks like a real result of
>> your commits on that PR.
>>
>> GitHash: 23248c3
>>  https://github.com/apache/spark/pull/32144#issuecomment-819679970
>> (GitHub Action)
>>  https://github.com/apache/spark/pull/32144#issuecomment-819647368
>> (Jenkins)
>>
>> GitHash: 8dbed7b
>> https://github.com/apache/spark/pull/32144#issuecomment-819684782
>> (GitHub Action)
>> https://github.com/apache/spark/pull/32144#issuecomment-819578976
>> (Jenkins)
>>
>> GitHash: a3a6c5e
>> https://github.com/apache/spark/pull/32144#issuecomment-819690465
>> (GitHub Action)
>> https://github.com/apache/spark/pull/32144#issuecomment-819793557
>> (Jenkins)
>>
>> GitHash: b6d26b7
>> https://github.com/apache/spark/pull/32144#issuecomment-819691416
>> (GitHub Action)
>> https://github.com/apache/spark/pull/32144#issuecomment-819791485
>> (Jenkins)
>>
>> Could you recheck it?
>>
>>
>> 1. Github-actions notification could be wrong when another PR opened
>>> with some same commits, and you will get a lot of fake comments then.
>>> Meanwhile, the new PR get no comments, even if it is actually the
>>> chosen one.
>>>1.1 https://github.com/apache/spark/pull/32144#issuecomment-819679970
>>>
>>
>>
>> On Wed, Apr 14, 2021 at 10:41 AM Kent Yao  wrote:
>>
>>> Hi ALL, here is something I notice after this change:
>>>
>>> 1. Github-actions notification could be wrong when another PR opened
>>> with some same commits, and you will get a lot of fake comments then.
>>> Meanwhile, the new PR get no comments, even if it is actually the
>>> chosen one.
>>>1.1 https://github.com/apache/spark/pull/32144#issuecomment-819679970
>>> 2. New Forks have to turn on GitHub action by the fork owner manually
>>> 3. `Notify test workflow` keeps waiting when the build flow canceled
>>> or the whole fork gone
>>> 4. After refreshed master or even re-forked :(, I still got failures
>>> and seems not alone
>>>4.1. https://github.com/apache/spark/pull/32168 (PR after sync)
>>>4.2. https://github.com/apache/spark/pull/32172 (PR after re-forked)
>>>4.3.
>>> https://github.com/attilapiros/spark/runs/2344911058?check_suite_focus=true
>>> (some other failures noticed)
>>>
>>>
>>> Bests,
>>>
>>> Kent
>>>
>>> Dongjoon Hyun  于2021年4月14日周三 下午11:34写道:
>>> >
>>> > Thank you again, Hyukjin.
>>> >
>>> > Bests,
>>> > Dongjoon.
>>> >
>>> > On Wed, Apr 14, 2021 at 5:25 AM Kent Yao  wrote:
>>> >>
>>> >> Cool, thanks!
>>> >>
>>> >> Hyukjin Kwon  于2021年4月14日周三 下午8:19写道:
>>> >>>
>>> >>> Good point! I had to clarify.
>>> >>> Once is enough. The sync is needed for your branch to include the
>>> changes of https://github.com/apache/spark/pull/32092.
>>> >>>
>>> >>>
>>> >>> 2021년 4월 14일 (수) 오후 9:11, Kent Yao 님이 작성:
>>> >>>>
>>> >>>> Hi Hyukjin,
>>> >>>>
>>> >>>> > Please sync your branch to the latest master branch in Apache
>>> Spark in order for the main repository to run the workflow and detect it.
>>> >>>>
>>> >>>> Do we need to sync master for every PR or just one-time cost to
>>> keep up with the current master branch?
>>> >>>>
>>> >>>> Ken

Re: [PSA] Please read: PR builder now runs test and build in your forked repository

2021-04-14 Thread Hyukjin Kwon
I think it works mostly correctly as Dongjoon investigated and shared
(Thanks a lot!).
One problem seems to be syncing to the master seems too strict (
https://github.com/apache/spark/pull/32168#issuecomment-819736508). Thanks
Yikun.
I think we should make it less strict. I can create a PR right away but
would like to encourage Yikun or Kent to do it in order to keep the credits
of their investigation.

2021년 4월 15일 (목) 오전 7:21, Dongjoon Hyun 님이 작성:

> Hi, Kent.
>
> I checked (1) in your PR, but those test result comments look correct to
> me.
> Please note that both Jenkins and GitHub Action leave the same number of
> comments on the same GitHash.
> Given that, there are not fake comments. It looks like a real result of
> your commits on that PR.
>
> GitHash: 23248c3
>  https://github.com/apache/spark/pull/32144#issuecomment-819679970
> (GitHub Action)
>  https://github.com/apache/spark/pull/32144#issuecomment-819647368
> (Jenkins)
>
> GitHash: 8dbed7b
> https://github.com/apache/spark/pull/32144#issuecomment-819684782
> (GitHub Action)
> https://github.com/apache/spark/pull/32144#issuecomment-819578976
> (Jenkins)
>
> GitHash: a3a6c5e
> https://github.com/apache/spark/pull/32144#issuecomment-819690465
> (GitHub Action)
> https://github.com/apache/spark/pull/32144#issuecomment-819793557
> (Jenkins)
>
> GitHash: b6d26b7
> https://github.com/apache/spark/pull/32144#issuecomment-819691416
> (GitHub Action)
> https://github.com/apache/spark/pull/32144#issuecomment-819791485
> (Jenkins)
>
> Could you recheck it?
>
>
> 1. Github-actions notification could be wrong when another PR opened
>> with some same commits, and you will get a lot of fake comments then.
>> Meanwhile, the new PR get no comments, even if it is actually the
>> chosen one.
>>1.1 https://github.com/apache/spark/pull/32144#issuecomment-819679970
>>
>
>
> On Wed, Apr 14, 2021 at 10:41 AM Kent Yao  wrote:
>
>> Hi ALL, here is something I notice after this change:
>>
>> 1. Github-actions notification could be wrong when another PR opened
>> with some same commits, and you will get a lot of fake comments then.
>> Meanwhile, the new PR get no comments, even if it is actually the
>> chosen one.
>>1.1 https://github.com/apache/spark/pull/32144#issuecomment-819679970
>> 2. New Forks have to turn on GitHub action by the fork owner manually
>> 3. `Notify test workflow` keeps waiting when the build flow canceled
>> or the whole fork gone
>> 4. After refreshed master or even re-forked :(, I still got failures
>> and seems not alone
>>4.1. https://github.com/apache/spark/pull/32168 (PR after sync)
>>4.2. https://github.com/apache/spark/pull/32172 (PR after re-forked)
>>4.3.
>> https://github.com/attilapiros/spark/runs/2344911058?check_suite_focus=true
>> (some other failures noticed)
>>
>>
>> Bests,
>>
>> Kent
>>
>> Dongjoon Hyun  于2021年4月14日周三 下午11:34写道:
>> >
>> > Thank you again, Hyukjin.
>> >
>> > Bests,
>> > Dongjoon.
>> >
>> > On Wed, Apr 14, 2021 at 5:25 AM Kent Yao  wrote:
>> >>
>> >> Cool, thanks!
>> >>
>> >> Hyukjin Kwon  于2021年4月14日周三 下午8:19写道:
>> >>>
>> >>> Good point! I had to clarify.
>> >>> Once is enough. The sync is needed for your branch to include the
>> changes of https://github.com/apache/spark/pull/32092.
>> >>>
>> >>>
>> >>> 2021년 4월 14일 (수) 오후 9:11, Kent Yao 님이 작성:
>> >>>>
>> >>>> Hi Hyukjin,
>> >>>>
>> >>>> > Please sync your branch to the latest master branch in Apache
>> Spark in order for the main repository to run the workflow and detect it.
>> >>>>
>> >>>> Do we need to sync master for every PR or just one-time cost to keep
>> up with the current master branch?
>> >>>>
>> >>>> Kent Yao
>> >>>> @ Data Science Center, Hangzhou Research Institute, NetEase Corp.
>> >>>> a spark enthusiast
>> >>>> kyuubiis a unified multi-tenant JDBC interface for large-scale data
>> processing and analytics, built on top of Apache Spark.
>> >>>>
>> >>>> spark-authorizerA Spark SQL extension which provides SQL Standard
>> Authorization for Apache Spark.
>> >>>> spark-postgres A library for reading data from and transferring data
>> to Postgres / Greenplum with Spark SQL and DataFrames, 10~100x faster.
>> >>>

Re: [PSA] Please read: PR builder now runs test and build in your forked repository

2021-04-14 Thread Dongjoon Hyun
Hi, Kent.

I checked (1) in your PR, but those test result comments look correct to me.
Please note that both Jenkins and GitHub Action leave the same number of
comments on the same GitHash.
Given that, there are not fake comments. It looks like a real result of
your commits on that PR.

GitHash: 23248c3
 https://github.com/apache/spark/pull/32144#issuecomment-819679970
(GitHub Action)
 https://github.com/apache/spark/pull/32144#issuecomment-819647368
(Jenkins)

GitHash: 8dbed7b
https://github.com/apache/spark/pull/32144#issuecomment-819684782
(GitHub Action)
https://github.com/apache/spark/pull/32144#issuecomment-819578976
(Jenkins)

GitHash: a3a6c5e
https://github.com/apache/spark/pull/32144#issuecomment-819690465
(GitHub Action)
https://github.com/apache/spark/pull/32144#issuecomment-819793557
(Jenkins)

GitHash: b6d26b7
https://github.com/apache/spark/pull/32144#issuecomment-819691416
(GitHub Action)
https://github.com/apache/spark/pull/32144#issuecomment-819791485
(Jenkins)

Could you recheck it?


1. Github-actions notification could be wrong when another PR opened
> with some same commits, and you will get a lot of fake comments then.
> Meanwhile, the new PR get no comments, even if it is actually the
> chosen one.
>1.1 https://github.com/apache/spark/pull/32144#issuecomment-819679970
>


On Wed, Apr 14, 2021 at 10:41 AM Kent Yao  wrote:

> Hi ALL, here is something I notice after this change:
>
> 1. Github-actions notification could be wrong when another PR opened
> with some same commits, and you will get a lot of fake comments then.
> Meanwhile, the new PR get no comments, even if it is actually the
> chosen one.
>1.1 https://github.com/apache/spark/pull/32144#issuecomment-819679970
> 2. New Forks have to turn on GitHub action by the fork owner manually
> 3. `Notify test workflow` keeps waiting when the build flow canceled
> or the whole fork gone
> 4. After refreshed master or even re-forked :(, I still got failures
> and seems not alone
>4.1. https://github.com/apache/spark/pull/32168 (PR after sync)
>4.2. https://github.com/apache/spark/pull/32172 (PR after re-forked)
>4.3.
> https://github.com/attilapiros/spark/runs/2344911058?check_suite_focus=true
> (some other failures noticed)
>
>
> Bests,
>
> Kent
>
> Dongjoon Hyun  于2021年4月14日周三 下午11:34写道:
> >
> > Thank you again, Hyukjin.
> >
> > Bests,
> > Dongjoon.
> >
> > On Wed, Apr 14, 2021 at 5:25 AM Kent Yao  wrote:
> >>
> >> Cool, thanks!
> >>
> >> Hyukjin Kwon  于2021年4月14日周三 下午8:19写道:
> >>>
> >>> Good point! I had to clarify.
> >>> Once is enough. The sync is needed for your branch to include the
> changes of https://github.com/apache/spark/pull/32092.
> >>>
> >>>
> >>> 2021년 4월 14일 (수) 오후 9:11, Kent Yao 님이 작성:
> >>>>
> >>>> Hi Hyukjin,
> >>>>
> >>>> > Please sync your branch to the latest master branch in Apache Spark
> in order for the main repository to run the workflow and detect it.
> >>>>
> >>>> Do we need to sync master for every PR or just one-time cost to keep
> up with the current master branch?
> >>>>
> >>>> Kent Yao
> >>>> @ Data Science Center, Hangzhou Research Institute, NetEase Corp.
> >>>> a spark enthusiast
> >>>> kyuubiis a unified multi-tenant JDBC interface for large-scale data
> processing and analytics, built on top of Apache Spark.
> >>>>
> >>>> spark-authorizerA Spark SQL extension which provides SQL Standard
> Authorization for Apache Spark.
> >>>> spark-postgres A library for reading data from and transferring data
> to Postgres / Greenplum with Spark SQL and DataFrames, 10~100x faster.
> >>>> spark-func-extrasA library that brings excellent and useful functions
> from various modern database management systems to Apache Spark.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On 04/14/2021 15:41,Kent Yao wrote:
> >>>>
> >>>> Cool~Thanks, Hyukjin
> >>>>
> >>>> Yuanjian Li  于2021年4月14日周三 下午3:39写道:
> >>>>>
> >>>>> Awesome! Thanks for making this happen, Hyukjin!
> >>>>>
> >>>>> Yi Wu  于2021年4月14日周三 下午2:51写道:
> >>>>>>
> >>>>>> Thanks for the great work, Hyukjin!
> >>>>>>
> >>>>>> On Wed, Apr 14, 2021 at 1:00 PM Gengliang Wang 
> wrote:
> >>>>>>>
> >>>

Re: [PSA] Please read: PR builder now runs test and build in your forked repository

2021-04-14 Thread Kent Yao
Hi ALL, here is something I notice after this change:

1. Github-actions notification could be wrong when another PR opened
with some same commits, and you will get a lot of fake comments then.
Meanwhile, the new PR get no comments, even if it is actually the
chosen one.
   1.1 https://github.com/apache/spark/pull/32144#issuecomment-819679970
2. New Forks have to turn on GitHub action by the fork owner manually
3. `Notify test workflow` keeps waiting when the build flow canceled
or the whole fork gone
4. After refreshed master or even re-forked :(, I still got failures
and seems not alone
   4.1. https://github.com/apache/spark/pull/32168 (PR after sync)
   4.2. https://github.com/apache/spark/pull/32172 (PR after re-forked)
   4.3. 
https://github.com/attilapiros/spark/runs/2344911058?check_suite_focus=true
(some other failures noticed)


Bests,

Kent

Dongjoon Hyun  于2021年4月14日周三 下午11:34写道:
>
> Thank you again, Hyukjin.
>
> Bests,
> Dongjoon.
>
> On Wed, Apr 14, 2021 at 5:25 AM Kent Yao  wrote:
>>
>> Cool, thanks!
>>
>> Hyukjin Kwon  于2021年4月14日周三 下午8:19写道:
>>>
>>> Good point! I had to clarify.
>>> Once is enough. The sync is needed for your branch to include the changes 
>>> of https://github.com/apache/spark/pull/32092.
>>>
>>>
>>> 2021년 4월 14일 (수) 오후 9:11, Kent Yao 님이 작성:
>>>>
>>>> Hi Hyukjin,
>>>>
>>>> > Please sync your branch to the latest master branch in Apache Spark in 
>>>> > order for the main repository to run the workflow and detect it.
>>>>
>>>> Do we need to sync master for every PR or just one-time cost to keep up 
>>>> with the current master branch?
>>>>
>>>> Kent Yao
>>>> @ Data Science Center, Hangzhou Research Institute, NetEase Corp.
>>>> a spark enthusiast
>>>> kyuubiis a unified multi-tenant JDBC interface for large-scale data 
>>>> processing and analytics, built on top of Apache Spark.
>>>>
>>>> spark-authorizerA Spark SQL extension which provides SQL Standard 
>>>> Authorization for Apache Spark.
>>>> spark-postgres A library for reading data from and transferring data to 
>>>> Postgres / Greenplum with Spark SQL and DataFrames, 10~100x faster.
>>>> spark-func-extrasA library that brings excellent and useful functions from 
>>>> various modern database management systems to Apache Spark.
>>>>
>>>>
>>>>
>>>>
>>>> On 04/14/2021 15:41,Kent Yao wrote:
>>>>
>>>> Cool~Thanks, Hyukjin
>>>>
>>>> Yuanjian Li  于2021年4月14日周三 下午3:39写道:
>>>>>
>>>>> Awesome! Thanks for making this happen, Hyukjin!
>>>>>
>>>>> Yi Wu  于2021年4月14日周三 下午2:51写道:
>>>>>>
>>>>>> Thanks for the great work, Hyukjin!
>>>>>>
>>>>>> On Wed, Apr 14, 2021 at 1:00 PM Gengliang Wang  wrote:
>>>>>>>
>>>>>>> Thanks for the amazing work, Hyukjin!
>>>>>>> I created a PR for trial and it looks well so far: 
>>>>>>> https://github.com/apache/spark/pull/32158
>>>>>>>
>>>>>>> On Wed, Apr 14, 2021 at 12:47 PM Hyukjin Kwon  
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> After https://github.com/apache/spark/pull/32092 merged, now we run 
>>>>>>>> the GitHub Actions
>>>>>>>> workflows in your forked repository.
>>>>>>>>
>>>>>>>> In short, please see this example HyukjinKwon#34
>>>>>>>>
>>>>>>>> You create a PR and your repository triggers the workflow. Your PR 
>>>>>>>> uses the resources allocated to you for testing.
>>>>>>>> Apache Spark repository finds your workflow, and links it in a comment 
>>>>>>>> in your PR
>>>>>>>>
>>>>>>>> Please let me know if you guys find any weird behaviour related to 
>>>>>>>> this.
>>>>>>>>
>>>>>>>>
>>>>>>>> What does that mean to contributors?
>>>>>>>>
>>>>>>>> Please sync your branch to the latest master branch in Apache Spark in 
>>>>>>>> order for your forked repository to run the workflow, and
>>>>>>>>

Re: [PSA] Please read: PR builder now runs test and build in your forked repository

2021-04-14 Thread Dongjoon Hyun
Thank you again, Hyukjin.

Bests,
Dongjoon.

On Wed, Apr 14, 2021 at 5:25 AM Kent Yao  wrote:

> Cool, thanks!
>
> Hyukjin Kwon  于2021年4月14日周三 下午8:19写道:
>
>> Good point! I had to clarify.
>> Once is enough. The sync is needed for your branch to include the changes
>> of https://github.com/apache/spark/pull/32092.
>>
>>
>> 2021년 4월 14일 (수) 오후 9:11, Kent Yao 님이 작성:
>>
>>> Hi Hyukjin,
>>>
>>> > Please sync your branch to the latest master branch in Apache Spark in
>>> order for the main repository to run the workflow and detect it.
>>>
>>> Do we need to sync master for every PR or just one-time cost to keep up
>>> with the current master branch?
>>>
>>> *Kent Yao *
>>> @ Data Science Center, Hangzhou Research Institute, NetEase Corp.
>>> *a spark enthusiast*
>>> *kyuubi is a
>>> unified multi-tenant JDBC interface for large-scale data processing and
>>> analytics, built on top of Apache Spark .*
>>> *spark-authorizer A Spark
>>> SQL extension which provides SQL Standard Authorization for **Apache
>>> Spark .*
>>> *spark-postgres  A library
>>> for reading data from and transferring data to Postgres / Greenplum with
>>> Spark SQL and DataFrames, 10~100x faster.*
>>> *spark-func-extras A
>>> library that brings excellent and useful functions from various modern
>>> database management systems to Apache Spark .*
>>>
>>>
>>>
>>> On 04/14/2021 15:41,Kent Yao  wrote:
>>>
>>> Cool~Thanks, Hyukjin
>>>
>>> Yuanjian Li  于2021年4月14日周三 下午3:39写道:
>>>
 Awesome! Thanks for making this happen, Hyukjin!

 Yi Wu  于2021年4月14日周三 下午2:51写道:

> Thanks for the great work, Hyukjin!
>
> On Wed, Apr 14, 2021 at 1:00 PM Gengliang Wang 
> wrote:
>
>> Thanks for the amazing work, Hyukjin!
>> I created a PR for trial and it looks well so far:
>> https://github.com/apache/spark/pull/32158
>>
>> On Wed, Apr 14, 2021 at 12:47 PM Hyukjin Kwon 
>> wrote:
>>
>>> Hi all,
>>>
>>> After https://github.com/apache/spark/pull/32092 merged, now we run
>>> the GitHub Actions
>>> workflows in your forked repository.
>>>
>>> In short, please see this example HyukjinKwon#34
>>> 
>>>
>>>1. You create a PR and your repository triggers the workflow.
>>>Your PR uses the resources allocated to you for testing.
>>>2. Apache Spark repository finds your workflow, and links it in
>>>a comment in your PR
>>>
>>> Please let me know if you guys find any weird behaviour related to
>>> this.
>>>
>>>
>>> *What does that mean to contributors?*
>>>
>>> Please sync your branch to the latest master branch in Apache Spark
>>> in order for your forked repository to run the workflow, and
>>> for the main repository to detect the workflow.
>>>
>>>
>>> *What does that mean to committers?*
>>>
>>> Now, GitHub Actions will show a green even when GitHub Actions
>>> builds are running (in contributor's forked repository).
>>> Please check the build notified by github-actions bot before merging
>>> it.
>>> There would be a followup work to reflect the status of the forked
>>> repository's build to the status of PR.
>>>
>>> 2021년 4월 14일 (수) 오후 1:42, Hyukjin Kwon 님이 작성:
>>>
 Hi all,

 After https://github.com/apache/spark/pull/32092 merged, now we
 run the GitHub Actions
 workflows in your forked repository.

 In short, please see this example HyukjinKwon#34
 

1. You create a PR and your repository triggers the workflow.
Your PR uses the resources allocated to you for testing.
2. Apache Spark repository finds your workflow, and links it in
a comment in your PR

 Please let me know if you guys find any weird behaviour related to
 this.


 *What does that mean to contributors?*

 Please sync your branch to the latest master branch in Apache Spark
 in order for the main repository to run the workflow and detect it.


 *What does that mean to committers?*

 Now, GitHub Actions will show a green even when GitHub Actions
 builds are running (in contributor's forked repository). Please check 
 the
 build notified by github-actions bot before merging it.
 There would be a followup work to reflect the status of the forked
 repository's build to
 the status of PR.



>
> --
>
>
>


Re: [PSA] Please read: PR builder now runs test and build in your forked repository

2021-04-14 Thread Kent Yao
Cool, thanks!

Hyukjin Kwon  于2021年4月14日周三 下午8:19写道:

> Good point! I had to clarify.
> Once is enough. The sync is needed for your branch to include the changes
> of https://github.com/apache/spark/pull/32092.
>
>
> 2021년 4월 14일 (수) 오후 9:11, Kent Yao 님이 작성:
>
>> Hi Hyukjin,
>>
>> > Please sync your branch to the latest master branch in Apache Spark in
>> order for the main repository to run the workflow and detect it.
>>
>> Do we need to sync master for every PR or just one-time cost to keep up
>> with the current master branch?
>>
>> *Kent Yao *
>> @ Data Science Center, Hangzhou Research Institute, NetEase Corp.
>> *a spark enthusiast*
>> *kyuubi is a
>> unified multi-tenant JDBC interface for large-scale data processing and
>> analytics, built on top of Apache Spark .*
>> *spark-authorizer A Spark
>> SQL extension which provides SQL Standard Authorization for **Apache
>> Spark .*
>> *spark-postgres  A library
>> for reading data from and transferring data to Postgres / Greenplum with
>> Spark SQL and DataFrames, 10~100x faster.*
>> *spark-func-extras A
>> library that brings excellent and useful functions from various modern
>> database management systems to Apache Spark .*
>>
>>
>>
>> On 04/14/2021 15:41,Kent Yao  wrote:
>>
>> Cool~Thanks, Hyukjin
>>
>> Yuanjian Li  于2021年4月14日周三 下午3:39写道:
>>
>>> Awesome! Thanks for making this happen, Hyukjin!
>>>
>>> Yi Wu  于2021年4月14日周三 下午2:51写道:
>>>
 Thanks for the great work, Hyukjin!

 On Wed, Apr 14, 2021 at 1:00 PM Gengliang Wang 
 wrote:

> Thanks for the amazing work, Hyukjin!
> I created a PR for trial and it looks well so far:
> https://github.com/apache/spark/pull/32158
>
> On Wed, Apr 14, 2021 at 12:47 PM Hyukjin Kwon 
> wrote:
>
>> Hi all,
>>
>> After https://github.com/apache/spark/pull/32092 merged, now we run
>> the GitHub Actions
>> workflows in your forked repository.
>>
>> In short, please see this example HyukjinKwon#34
>> 
>>
>>1. You create a PR and your repository triggers the workflow.
>>Your PR uses the resources allocated to you for testing.
>>2. Apache Spark repository finds your workflow, and links it in a
>>comment in your PR
>>
>> Please let me know if you guys find any weird behaviour related to
>> this.
>>
>>
>> *What does that mean to contributors?*
>>
>> Please sync your branch to the latest master branch in Apache Spark
>> in order for your forked repository to run the workflow, and
>> for the main repository to detect the workflow.
>>
>>
>> *What does that mean to committers?*
>>
>> Now, GitHub Actions will show a green even when GitHub Actions builds
>> are running (in contributor's forked repository).
>> Please check the build notified by github-actions bot before merging
>> it.
>> There would be a followup work to reflect the status of the forked
>> repository's build to the status of PR.
>>
>> 2021년 4월 14일 (수) 오후 1:42, Hyukjin Kwon 님이 작성:
>>
>>> Hi all,
>>>
>>> After https://github.com/apache/spark/pull/32092 merged, now we run
>>> the GitHub Actions
>>> workflows in your forked repository.
>>>
>>> In short, please see this example HyukjinKwon#34
>>> 
>>>
>>>1. You create a PR and your repository triggers the workflow.
>>>Your PR uses the resources allocated to you for testing.
>>>2. Apache Spark repository finds your workflow, and links it in
>>>a comment in your PR
>>>
>>> Please let me know if you guys find any weird behaviour related to
>>> this.
>>>
>>>
>>> *What does that mean to contributors?*
>>>
>>> Please sync your branch to the latest master branch in Apache Spark
>>> in order for the main repository to run the workflow and detect it.
>>>
>>>
>>> *What does that mean to committers?*
>>>
>>> Now, GitHub Actions will show a green even when GitHub Actions
>>> builds are running (in contributor's forked repository). Please check 
>>> the
>>> build notified by github-actions bot before merging it.
>>> There would be a followup work to reflect the status of the forked
>>> repository's build to
>>> the status of PR.
>>>
>>>
>>>

--


Re: [PSA] Please read: PR builder now runs test and build in your forked repository

2021-04-14 Thread Hyukjin Kwon
Good point! I had to clarify.
Once is enough. The sync is needed for your branch to include the changes
of https://github.com/apache/spark/pull/32092.


2021년 4월 14일 (수) 오후 9:11, Kent Yao 님이 작성:

> Hi Hyukjin,
>
> > Please sync your branch to the latest master branch in Apache Spark in
> order for the main repository to run the workflow and detect it.
>
> Do we need to sync master for every PR or just one-time cost to keep up
> with the current master branch?
>
> *Kent Yao *
> @ Data Science Center, Hangzhou Research Institute, NetEase Corp.
> *a spark enthusiast*
> *kyuubi is a unified multi-tenant JDBC
> interface for large-scale data processing and analytics, built on top
> of Apache Spark .*
> *spark-authorizer A Spark
> SQL extension which provides SQL Standard Authorization for **Apache
> Spark .*
> *spark-postgres  A library for
> reading data from and transferring data to Postgres / Greenplum with Spark
> SQL and DataFrames, 10~100x faster.*
> *spark-func-extras A
> library that brings excellent and useful functions from various modern
> database management systems to Apache Spark .*
>
>
>
> On 04/14/2021 15:41,Kent Yao  wrote:
>
> Cool~Thanks, Hyukjin
>
> Yuanjian Li  于2021年4月14日周三 下午3:39写道:
>
>> Awesome! Thanks for making this happen, Hyukjin!
>>
>> Yi Wu  于2021年4月14日周三 下午2:51写道:
>>
>>> Thanks for the great work, Hyukjin!
>>>
>>> On Wed, Apr 14, 2021 at 1:00 PM Gengliang Wang  wrote:
>>>
 Thanks for the amazing work, Hyukjin!
 I created a PR for trial and it looks well so far:
 https://github.com/apache/spark/pull/32158

 On Wed, Apr 14, 2021 at 12:47 PM Hyukjin Kwon 
 wrote:

> Hi all,
>
> After https://github.com/apache/spark/pull/32092 merged, now we run
> the GitHub Actions
> workflows in your forked repository.
>
> In short, please see this example HyukjinKwon#34
> 
>
>1. You create a PR and your repository triggers the workflow. Your
>PR uses the resources allocated to you for testing.
>2. Apache Spark repository finds your workflow, and links it in a
>comment in your PR
>
> Please let me know if you guys find any weird behaviour related to
> this.
>
>
> *What does that mean to contributors?*
>
> Please sync your branch to the latest master branch in Apache Spark in
> order for your forked repository to run the workflow, and
> for the main repository to detect the workflow.
>
>
> *What does that mean to committers?*
>
> Now, GitHub Actions will show a green even when GitHub Actions builds
> are running (in contributor's forked repository).
> Please check the build notified by github-actions bot before merging
> it.
> There would be a followup work to reflect the status of the forked
> repository's build to the status of PR.
>
> 2021년 4월 14일 (수) 오후 1:42, Hyukjin Kwon 님이 작성:
>
>> Hi all,
>>
>> After https://github.com/apache/spark/pull/32092 merged, now we run
>> the GitHub Actions
>> workflows in your forked repository.
>>
>> In short, please see this example HyukjinKwon#34
>> 
>>
>>1. You create a PR and your repository triggers the workflow.
>>Your PR uses the resources allocated to you for testing.
>>2. Apache Spark repository finds your workflow, and links it in a
>>comment in your PR
>>
>> Please let me know if you guys find any weird behaviour related to
>> this.
>>
>>
>> *What does that mean to contributors?*
>>
>> Please sync your branch to the latest master branch in Apache Spark
>> in order for the main repository to run the workflow and detect it.
>>
>>
>> *What does that mean to committers?*
>>
>> Now, GitHub Actions will show a green even when GitHub Actions builds
>> are running (in contributor's forked repository). Please check the build
>> notified by github-actions bot before merging it.
>> There would be a followup work to reflect the status of the forked
>> repository's build to
>> the status of PR.
>>
>>
>>


Re: [PSA] Please read: PR builder now runs test and build in your forked repository

2021-04-14 Thread Kent Yao






Hi Hyukjin,> Please sync your branch to the latest master branch in Apache Spark in order for the main repository to run the workflow and detect it.







Do we need to sync master for every PR or just one-time cost to keep up with the current master branch?

  



















Kent Yao @ Data Science Center, Hangzhou Research Institute, NetEase Corp.a spark enthusiastkyuubiis a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark.spark-authorizerA Spark SQL extension which provides SQL Standard Authorization for Apache Spark.spark-postgres A library for reading data from and transferring data to Postgres / Greenplum with Spark SQL and DataFrames, 10~100x faster.spark-func-extrasA library that brings excellent and useful functions from various modern database management systems to Apache Spark.















 


On 04/14/2021 15:41,Kent Yao wrote: 


Cool~Thanks, HyukjinYuanjian Li  于2021年4月14日周三 下午3:39写道:Awesome! Thanks for making this happen, Hyukjin!Yi Wu  于2021年4月14日周三 下午2:51写道:Thanks for the great work, Hyukjin!On Wed, Apr 14, 2021 at 1:00 PM Gengliang Wang  wrote:Thanks for the amazing work, Hyukjin!I created a PR for trial and it looks well so far: https://github.com/apache/spark/pull/32158On Wed, Apr 14, 2021 at 12:47 PM Hyukjin Kwon  wrote:Hi all,After https://github.com/apache/spark/pull/32092 merged, now we run the GitHub Actionsworkflows in your forked repository.In short, please see this example HyukjinKwon#34You create a PR and your repository triggers the workflow. Your PR uses the resources allocated to you for testing.Apache Spark repository finds your workflow, and links it in a comment in your PRPlease let me know if you guys find any weird behaviour related to this.What does that mean to contributors?Please sync your branch to the latest master branch in Apache Spark in order for your forked repository to run the workflow, andfor the main repository to detect the workflow.What does that mean to committers?Now, GitHub Actions will show a green even when GitHub Actions builds are running (in contributor's forked repository).Please check the build notified by github-actions bot before merging it.There would be a followup work to reflect the status of the forked repository's build to the status of PR.2021년 4월 14일 (수) 오후 1:42, Hyukjin Kwon 님이 작성:Hi all,After https://github.com/apache/spark/pull/32092 merged, now we run the GitHub Actionsworkflows in your forked repository.In short, please see this example HyukjinKwon#34You create a PR and your repository triggers the workflow. Your PR uses the resources allocated to you for testing.Apache Spark repository finds your workflow, and links it in a comment in your PRPlease let me know if you guys find any weird behaviour related to this.What does that mean to contributors?Please sync your branch to the latest master branch in Apache Spark in order for the main repository to run the workflow and detect it.What does that mean to committers?Now, GitHub Actions will show a green even when GitHub Actions builds are running (in contributor's forked repository). Please check the build notified by github-actions bot before merging it.There would be a followup work to reflect the status of the forked repository's build tothe status of PR.










Re: [PSA] Please read: PR builder now runs test and build in your forked repository

2021-04-14 Thread Kent Yao
Cool~Thanks, Hyukjin

Yuanjian Li  于2021年4月14日周三 下午3:39写道:

> Awesome! Thanks for making this happen, Hyukjin!
>
> Yi Wu  于2021年4月14日周三 下午2:51写道:
>
>> Thanks for the great work, Hyukjin!
>>
>> On Wed, Apr 14, 2021 at 1:00 PM Gengliang Wang  wrote:
>>
>>> Thanks for the amazing work, Hyukjin!
>>> I created a PR for trial and it looks well so far:
>>> https://github.com/apache/spark/pull/32158
>>>
>>> On Wed, Apr 14, 2021 at 12:47 PM Hyukjin Kwon 
>>> wrote:
>>>
 Hi all,

 After https://github.com/apache/spark/pull/32092 merged, now we run
 the GitHub Actions
 workflows in your forked repository.

 In short, please see this example HyukjinKwon#34
 

1. You create a PR and your repository triggers the workflow. Your
PR uses the resources allocated to you for testing.
2. Apache Spark repository finds your workflow, and links it in a
comment in your PR

 Please let me know if you guys find any weird behaviour related to this.


 *What does that mean to contributors?*

 Please sync your branch to the latest master branch in Apache Spark in
 order for your forked repository to run the workflow, and
 for the main repository to detect the workflow.


 *What does that mean to committers?*

 Now, GitHub Actions will show a green even when GitHub Actions builds
 are running (in contributor's forked repository).
 Please check the build notified by github-actions bot before merging it.
 There would be a followup work to reflect the status of the forked
 repository's build to the status of PR.

 2021년 4월 14일 (수) 오후 1:42, Hyukjin Kwon 님이 작성:

> Hi all,
>
> After https://github.com/apache/spark/pull/32092 merged, now we run
> the GitHub Actions
> workflows in your forked repository.
>
> In short, please see this example HyukjinKwon#34
> 
>
>1. You create a PR and your repository triggers the workflow. Your
>PR uses the resources allocated to you for testing.
>2. Apache Spark repository finds your workflow, and links it in a
>comment in your PR
>
> Please let me know if you guys find any weird behaviour related to
> this.
>
>
> *What does that mean to contributors?*
>
> Please sync your branch to the latest master branch in Apache Spark in
> order for the main repository to run the workflow and detect it.
>
>
> *What does that mean to committers?*
>
> Now, GitHub Actions will show a green even when GitHub Actions builds
> are running (in contributor's forked repository). Please check the build
> notified by github-actions bot before merging it.
> There would be a followup work to reflect the status of the forked
> repository's build to
> the status of PR.
>
>
>


Re: [PSA] Please read: PR builder now runs test and build in your forked repository

2021-04-14 Thread Yuanjian Li
Awesome! Thanks for making this happen, Hyukjin!

Yi Wu  于2021年4月14日周三 下午2:51写道:

> Thanks for the great work, Hyukjin!
>
> On Wed, Apr 14, 2021 at 1:00 PM Gengliang Wang  wrote:
>
>> Thanks for the amazing work, Hyukjin!
>> I created a PR for trial and it looks well so far:
>> https://github.com/apache/spark/pull/32158
>>
>> On Wed, Apr 14, 2021 at 12:47 PM Hyukjin Kwon 
>> wrote:
>>
>>> Hi all,
>>>
>>> After https://github.com/apache/spark/pull/32092 merged, now we run the
>>> GitHub Actions
>>> workflows in your forked repository.
>>>
>>> In short, please see this example HyukjinKwon#34
>>> 
>>>
>>>1. You create a PR and your repository triggers the workflow. Your
>>>PR uses the resources allocated to you for testing.
>>>2. Apache Spark repository finds your workflow, and links it in a
>>>comment in your PR
>>>
>>> Please let me know if you guys find any weird behaviour related to this.
>>>
>>>
>>> *What does that mean to contributors?*
>>>
>>> Please sync your branch to the latest master branch in Apache Spark in
>>> order for your forked repository to run the workflow, and
>>> for the main repository to detect the workflow.
>>>
>>>
>>> *What does that mean to committers?*
>>>
>>> Now, GitHub Actions will show a green even when GitHub Actions builds
>>> are running (in contributor's forked repository).
>>> Please check the build notified by github-actions bot before merging it.
>>> There would be a followup work to reflect the status of the forked
>>> repository's build to the status of PR.
>>>
>>> 2021년 4월 14일 (수) 오후 1:42, Hyukjin Kwon 님이 작성:
>>>
 Hi all,

 After https://github.com/apache/spark/pull/32092 merged, now we run
 the GitHub Actions
 workflows in your forked repository.

 In short, please see this example HyukjinKwon#34
 

1. You create a PR and your repository triggers the workflow. Your
PR uses the resources allocated to you for testing.
2. Apache Spark repository finds your workflow, and links it in a
comment in your PR

 Please let me know if you guys find any weird behaviour related to this.


 *What does that mean to contributors?*

 Please sync your branch to the latest master branch in Apache Spark in
 order for the main repository to run the workflow and detect it.


 *What does that mean to committers?*

 Now, GitHub Actions will show a green even when GitHub Actions builds
 are running (in contributor's forked repository). Please check the build
 notified by github-actions bot before merging it.
 There would be a followup work to reflect the status of the forked
 repository's build to
 the status of PR.





Re: [PSA] Please read: PR builder now runs test and build in your forked repository

2021-04-13 Thread Yi Wu
Thanks for the great work, Hyukjin!

On Wed, Apr 14, 2021 at 1:00 PM Gengliang Wang  wrote:

> Thanks for the amazing work, Hyukjin!
> I created a PR for trial and it looks well so far:
> https://github.com/apache/spark/pull/32158
>
> On Wed, Apr 14, 2021 at 12:47 PM Hyukjin Kwon  wrote:
>
>> Hi all,
>>
>> After https://github.com/apache/spark/pull/32092 merged, now we run the
>> GitHub Actions
>> workflows in your forked repository.
>>
>> In short, please see this example HyukjinKwon#34
>> 
>>
>>1. You create a PR and your repository triggers the workflow. Your PR
>>uses the resources allocated to you for testing.
>>2. Apache Spark repository finds your workflow, and links it in a
>>comment in your PR
>>
>> Please let me know if you guys find any weird behaviour related to this.
>>
>>
>> *What does that mean to contributors?*
>>
>> Please sync your branch to the latest master branch in Apache Spark in
>> order for your forked repository to run the workflow, and
>> for the main repository to detect the workflow.
>>
>>
>> *What does that mean to committers?*
>>
>> Now, GitHub Actions will show a green even when GitHub Actions builds are
>> running (in contributor's forked repository).
>> Please check the build notified by github-actions bot before merging it.
>> There would be a followup work to reflect the status of the forked
>> repository's build to the status of PR.
>>
>> 2021년 4월 14일 (수) 오후 1:42, Hyukjin Kwon 님이 작성:
>>
>>> Hi all,
>>>
>>> After https://github.com/apache/spark/pull/32092 merged, now we run the
>>> GitHub Actions
>>> workflows in your forked repository.
>>>
>>> In short, please see this example HyukjinKwon#34
>>> 
>>>
>>>1. You create a PR and your repository triggers the workflow. Your
>>>PR uses the resources allocated to you for testing.
>>>2. Apache Spark repository finds your workflow, and links it in a
>>>comment in your PR
>>>
>>> Please let me know if you guys find any weird behaviour related to this.
>>>
>>>
>>> *What does that mean to contributors?*
>>>
>>> Please sync your branch to the latest master branch in Apache Spark in
>>> order for the main repository to run the workflow and detect it.
>>>
>>>
>>> *What does that mean to committers?*
>>>
>>> Now, GitHub Actions will show a green even when GitHub Actions builds
>>> are running (in contributor's forked repository). Please check the build
>>> notified by github-actions bot before merging it.
>>> There would be a followup work to reflect the status of the forked
>>> repository's build to
>>> the status of PR.
>>>
>>>
>>>


Re: [PSA] Please read: PR builder now runs test and build in your forked repository

2021-04-13 Thread Gengliang Wang
Thanks for the amazing work, Hyukjin!
I created a PR for trial and it looks well so far:
https://github.com/apache/spark/pull/32158

On Wed, Apr 14, 2021 at 12:47 PM Hyukjin Kwon  wrote:

> Hi all,
>
> After https://github.com/apache/spark/pull/32092 merged, now we run the
> GitHub Actions
> workflows in your forked repository.
>
> In short, please see this example HyukjinKwon#34
> 
>
>1. You create a PR and your repository triggers the workflow. Your PR
>uses the resources allocated to you for testing.
>2. Apache Spark repository finds your workflow, and links it in a
>comment in your PR
>
> Please let me know if you guys find any weird behaviour related to this.
>
>
> *What does that mean to contributors?*
>
> Please sync your branch to the latest master branch in Apache Spark in
> order for your forked repository to run the workflow, and
> for the main repository to detect the workflow.
>
>
> *What does that mean to committers?*
>
> Now, GitHub Actions will show a green even when GitHub Actions builds are
> running (in contributor's forked repository).
> Please check the build notified by github-actions bot before merging it.
> There would be a followup work to reflect the status of the forked
> repository's build to the status of PR.
>
> 2021년 4월 14일 (수) 오후 1:42, Hyukjin Kwon 님이 작성:
>
>> Hi all,
>>
>> After https://github.com/apache/spark/pull/32092 merged, now we run the
>> GitHub Actions
>> workflows in your forked repository.
>>
>> In short, please see this example HyukjinKwon#34
>> 
>>
>>1. You create a PR and your repository triggers the workflow. Your PR
>>uses the resources allocated to you for testing.
>>2. Apache Spark repository finds your workflow, and links it in a
>>comment in your PR
>>
>> Please let me know if you guys find any weird behaviour related to this.
>>
>>
>> *What does that mean to contributors?*
>>
>> Please sync your branch to the latest master branch in Apache Spark in
>> order for the main repository to run the workflow and detect it.
>>
>>
>> *What does that mean to committers?*
>>
>> Now, GitHub Actions will show a green even when GitHub Actions builds are
>> running (in contributor's forked repository). Please check the build
>> notified by github-actions bot before merging it.
>> There would be a followup work to reflect the status of the forked
>> repository's build to
>> the status of PR.
>>
>>
>>


Re: [PSA] Please read: PR builder now runs test and build in your forked repository

2021-04-13 Thread Hyukjin Kwon
Hi all,

After https://github.com/apache/spark/pull/32092 merged, now we run the
GitHub Actions
workflows in your forked repository.

In short, please see this example HyukjinKwon#34


   1. You create a PR and your repository triggers the workflow. Your PR
   uses the resources allocated to you for testing.
   2. Apache Spark repository finds your workflow, and links it in a
   comment in your PR

Please let me know if you guys find any weird behaviour related to this.


*What does that mean to contributors?*

Please sync your branch to the latest master branch in Apache Spark in
order for your forked repository to run the workflow, and
for the main repository to detect the workflow.


*What does that mean to committers?*

Now, GitHub Actions will show a green even when GitHub Actions builds are
running (in contributor's forked repository).
Please check the build notified by github-actions bot before merging it.
There would be a followup work to reflect the status of the forked
repository's build to the status of PR.

2021년 4월 14일 (수) 오후 1:42, Hyukjin Kwon 님이 작성:

> Hi all,
>
> After https://github.com/apache/spark/pull/32092 merged, now we run the
> GitHub Actions
> workflows in your forked repository.
>
> In short, please see this example HyukjinKwon#34
> 
>
>1. You create a PR and your repository triggers the workflow. Your PR
>uses the resources allocated to you for testing.
>2. Apache Spark repository finds your workflow, and links it in a
>comment in your PR
>
> Please let me know if you guys find any weird behaviour related to this.
>
>
> *What does that mean to contributors?*
>
> Please sync your branch to the latest master branch in Apache Spark in
> order for the main repository to run the workflow and detect it.
>
>
> *What does that mean to committers?*
>
> Now, GitHub Actions will show a green even when GitHub Actions builds are
> running (in contributor's forked repository). Please check the build
> notified by github-actions bot before merging it.
> There would be a followup work to reflect the status of the forked
> repository's build to
> the status of PR.
>
>
>


[PSA] Please read: PR builder now runs test and build in your forked repository

2021-04-13 Thread Hyukjin Kwon
Hi all,

After https://github.com/apache/spark/pull/32092 merged, now we run the
GitHub Actions
workflows in your forked repository.

In short, please see this example HyukjinKwon#34


   1. You create a PR and your repository triggers the workflow. Your PR
   uses the resources allocated to you for testing.
   2. Apache Spark repository finds your workflow, and links it in a
   comment in your PR

Please let me know if you guys find any weird behaviour related to this.


*What does that mean to contributors?*

Please sync your branch to the latest master branch in Apache Spark in
order for the main repository to run the workflow and detect it.


*What does that mean to committers?*

Now, GitHub Actions will show a green even when GitHub Actions builds are
running (in contributor's forked repository). Please check the build
notified by github-actions bot before merging it.
There would be a followup work to reflect the status of the forked
repository's build to
the status of PR.


Re: K8s Integration test is unable to run because of the unavailable libs

2021-03-22 Thread Yikun Jiang
hey, Yi Wu

Looks like it's just an apt installation problem, we should do apt update
to refresh the local package cache list before we install the "gnupg".

I opened a issue on jira [1] , and try to fix it in [2], hope this helps.

[1] https://issues.apache.org/jira/browse/SPARK-34820
[2] https://github.com/apache/spark/pull/31923

Regards,
Yikun


Yi Wu  于2021年3月22日周一 下午2:15写道:

> Hi devs,
>
> It seems like the K8s Integration test is unable to run recently because
> of the unavailable libs:
>
> Err:20 http://security.debian.org/debian-security buster/updates/main amd64 
> libldap-common all 2.4.47+dfsg-3+deb10u4
>   404  Not Found [IP: 151.101.194.132 80]
> Err:21 http://security.debian.org/debian-security buster/updates/main amd64 
> libldap-2.4-2 amd64 2.4.47+dfsg-3+deb10u4
>   404  Not Found [IP: 151.101.194.132 80]
> E: Failed to fetch 
> http://security.debian.org/debian-security/pool/updates/main/o/openldap/libldap-common_2.4.47+dfsg-3+deb10u4_all.deb
>   404  Not Found [IP: 151.101.194.132 80]
> E: Failed to fetch 
> http://security.debian.org/debian-security/pool/updates/main/o/openldap/libldap-2.4-2_2.4.47+dfsg-3+deb10u4_amd64.deb
>   404  Not Found [IP: 151.101.194.132 80]
>
>
> I alreay saw the error is many places, e.g.,
>
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40840/console
>
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40837/console
>
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40715/console
>
>
> Could someone familiar with K8s please take a look?
>
>
> Thanks,
>
> Yi
>
>
>


K8s Integration test is unable to run because of the unavailable libs

2021-03-21 Thread Yi Wu
Hi devs,

It seems like the K8s Integration test is unable to run recently because of
the unavailable libs:

Err:20 http://security.debian.org/debian-security buster/updates/main
amd64 libldap-common all 2.4.47+dfsg-3+deb10u4
  404  Not Found [IP: 151.101.194.132 80]
Err:21 http://security.debian.org/debian-security buster/updates/main
amd64 libldap-2.4-2 amd64 2.4.47+dfsg-3+deb10u4
  404  Not Found [IP: 151.101.194.132 80]
E: Failed to fetch
http://security.debian.org/debian-security/pool/updates/main/o/openldap/libldap-common_2.4.47+dfsg-3+deb10u4_all.deb
 404  Not Found [IP: 151.101.194.132 80]
E: Failed to fetch
http://security.debian.org/debian-security/pool/updates/main/o/openldap/libldap-2.4-2_2.4.47+dfsg-3+deb10u4_amd64.deb
 404  Not Found [IP: 151.101.194.132 80]


I alreay saw the error is many places, e.g.,


https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40840/console


https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40837/console


https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40715/console


Could someone familiar with K8s please take a look?


Thanks,

Yi


Re: K8s integration test failure ("credentials Jenkins is using is probably wrong...")

2021-02-23 Thread shane knapp ☠
stupid bash variable assignment.  i'm surprised this has lingered for as
long as it had (3 years).

it's fixed and shouldn't be an issue any more.

On Tue, Feb 23, 2021 at 9:28 AM shane knapp ☠  wrote:

> the AmplabJenks bot's github creds are out of date, which is causing that
> non-fatal error.  however, if you scroll back you'll see that minikube
> actually failed to start.  that should have definitely failed the build, so
> i'll look at the job's bash logic and see what we missed.
>
> also, that worker (research-jenkins-worker-07) had some lingering builds
> running and i bet there was a collision w/a dangling minikube instance.
> i'm rebooting that worker now.
>
> shane
>
>
>
> On Tue, Feb 23, 2021 at 6:47 AM Sean Owen  wrote:
>
>> Shane would you know? May be a problem with a single worker.
>>
>> On Tue, Feb 23, 2021 at 8:46 AM Phillip Henry 
>> wrote:
>>
>>>
>>> Hi,
>>>
>>> Silly question: the Jenkins build for my PR is failing but it seems
>>> outside of my control. What must I do to remedy this?
>>>
>>> I've submitted
>>>
>>> https://github.com/apache/spark/pull/31535
>>>
>>> but Spark QA is telling me "Kubernetes integration test status failure".
>>>
>>> The Jenkins job says "SUCCESS" but also barfs with:
>>>
>>> FileNotFoundException means that the credentials Jenkins is using is 
>>> probably wrong. Or the user account does not have write access to the repo.
>>>
>>>
>>> See
>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39934/consoleFull
>>>
>>> Can anybody please advise?
>>>
>>> Thanks in advance.
>>>
>>> Phillip
>>>
>>>
>>>
>
> --
> Shane Knapp
> Computer Guy / Voice of Reason
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


-- 
Shane Knapp
Computer Guy / Voice of Reason
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: K8s integration test failure ("credentials Jenkins is using is probably wrong...")

2021-02-23 Thread shane knapp ☠
the AmplabJenks bot's github creds are out of date, which is causing that
non-fatal error.  however, if you scroll back you'll see that minikube
actually failed to start.  that should have definitely failed the build, so
i'll look at the job's bash logic and see what we missed.

also, that worker (research-jenkins-worker-07) had some lingering builds
running and i bet there was a collision w/a dangling minikube instance.
i'm rebooting that worker now.

shane



On Tue, Feb 23, 2021 at 6:47 AM Sean Owen  wrote:

> Shane would you know? May be a problem with a single worker.
>
> On Tue, Feb 23, 2021 at 8:46 AM Phillip Henry 
> wrote:
>
>>
>> Hi,
>>
>> Silly question: the Jenkins build for my PR is failing but it seems
>> outside of my control. What must I do to remedy this?
>>
>> I've submitted
>>
>> https://github.com/apache/spark/pull/31535
>>
>> but Spark QA is telling me "Kubernetes integration test status failure".
>>
>> The Jenkins job says "SUCCESS" but also barfs with:
>>
>> FileNotFoundException means that the credentials Jenkins is using is 
>> probably wrong. Or the user account does not have write access to the repo.
>>
>>
>> See
>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39934/consoleFull
>>
>> Can anybody please advise?
>>
>> Thanks in advance.
>>
>> Phillip
>>
>>
>>

-- 
Shane Knapp
Computer Guy / Voice of Reason
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: K8s integration test failure ("credentials Jenkins is using is probably wrong...")

2021-02-23 Thread Sean Owen
Shane would you know? May be a problem with a single worker.

On Tue, Feb 23, 2021 at 8:46 AM Phillip Henry 
wrote:

>
> Hi,
>
> Silly question: the Jenkins build for my PR is failing but it seems
> outside of my control. What must I do to remedy this?
>
> I've submitted
>
> https://github.com/apache/spark/pull/31535
>
> but Spark QA is telling me "Kubernetes integration test status failure".
>
> The Jenkins job says "SUCCESS" but also barfs with:
>
> FileNotFoundException means that the credentials Jenkins is using is probably 
> wrong. Or the user account does not have write access to the repo.
>
>
> See
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39934/consoleFull
>
> Can anybody please advise?
>
> Thanks in advance.
>
> Phillip
>
>
>


K8s integration test failure ("credentials Jenkins is using is probably wrong...")

2021-02-23 Thread Phillip Henry
Hi,

Silly question: the Jenkins build for my PR is failing but it seems outside
of my control. What must I do to remedy this?

I've submitted

https://github.com/apache/spark/pull/31535

but Spark QA is telling me "Kubernetes integration test status failure".

The Jenkins job says "SUCCESS" but also barfs with:

FileNotFoundException means that the credentials Jenkins is using is
probably wrong. Or the user account does not have write access to the
repo.


See
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39934/consoleFull

Can anybody please advise?

Thanks in advance.

Phillip


Re: Unit test failure in spark-core

2020-10-12 Thread Stephen Coy
Sorry, I forgot:

[scoy@Steves-Core-i9-2 core]$ java -version
openjdk version "1.8.0_262"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_262-b10)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.262-b10, mixed mode)

which is on MacOS 10.15.7

On 13 Oct 2020, at 12:47 pm, Stephen Coy 
mailto:s...@infomedia.com.au.INVALID>> wrote:

Hi all,

When trying to build current master with a simple:

mvn clean install

I get a consistent unit test failure in core:

[ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.403 s 
<<< FAILURE! - in org.apache.spark.launcher.SparkLauncherSuite
[ERROR] testSparkLauncherGetError(org.apache.spark.launcher.SparkLauncherSuite) 
 Time elapsed: 2.015 s  <<< FAILURE!
java.lang.AssertionError
at 
org.apache.spark.launcher.SparkLauncherSuite.testSparkLauncherGetError(SparkLauncherSuite.java:274)

I believe the applicable messages from the unit-tests.log file are:

20/10/13 12:20:35.875 spark-app-1: '' WARN InProcessAppHandle: 
Application failed with exception.
org.apache.spark.SparkException: Failed to get main class in JAR with error 
'File spark-internal does not exist'.  Please specify one with --class.
at org.apache.spark.deploy.SparkSubmit.error(SparkSubmit.scala:942)
at 
org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:457)
at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:877)
at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at 
org.apache.spark.deploy.InProcessSparkSubmit$.main(SparkSubmit.scala:954)
at org.apache.spark.deploy.InProcessSparkSubmit.main(SparkSubmit.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.launcher.InProcessAppHandle.lambda$start$0(InProcessAppHandle.java:72)
at java.lang.Thread.run(Thread.java:748)


org.apache.spark.launcher.SparkLauncherSuite#testSparkLauncherGetError is the 
failing test, so I improved the the failing assertion by changing it from:

  
assertTrue(handle.getError().get().getMessage().contains(EXCEPTION_MESSAGE));

to:

  assertThat(handle.getError().get().getMessage(), 
containsString(EXCEPTION_MESSAGE));

This yields:

[ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.155 s 
<<< FAILURE! - in org.apache.spark.launcher.SparkLauncherSuite
[ERROR] testSparkLauncherGetError(org.apache.spark.launcher.SparkLauncherSuite) 
 Time elapsed: 2.02 s  <<< FAILURE!
java.lang.AssertionError:

Expected: a string containing "dummy-exception"
 but: was "Error: Failed to load class 
org.apache.spark.launcher.SparkLauncherSuite$ErrorInProcessTestApp."
at 
org.apache.spark.launcher.SparkLauncherSuite.testSparkLauncherGetError(SparkLauncherSuite.java:276)


Which loosely correlates with error in unit-tests.log.

Any ideas?

Thanks,

Steve C



This email contains confidential information of and is the copyright of 
Infomedia. It must not be forwarded, amended or disclosed without consent of 
the sender. If you received this message by mistake, please advise the sender 
and delete all copies. Security of transmission on the internet cannot be 
guaranteed, could be infected, intercepted, or corrupted and you should ensure 
you have suitable antivirus protection in place. By sending us your or any 
third party personal details, you consent to (or confirm you have obtained 
consent from such third parties) to Infomedia’s privacy policy. 
http://www.infomedia.com.au/privacy-policy/



Unit test failure in spark-core

2020-10-12 Thread Stephen Coy
Hi all,

When trying to build current master with a simple:

mvn clean install

I get a consistent unit test failure in core:

[ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.403 s 
<<< FAILURE! - in org.apache.spark.launcher.SparkLauncherSuite
[ERROR] testSparkLauncherGetError(org.apache.spark.launcher.SparkLauncherSuite) 
 Time elapsed: 2.015 s  <<< FAILURE!
java.lang.AssertionError
at 
org.apache.spark.launcher.SparkLauncherSuite.testSparkLauncherGetError(SparkLauncherSuite.java:274)

I believe the applicable messages from the unit-tests.log file are:

20/10/13 12:20:35.875 spark-app-1: '' WARN InProcessAppHandle: 
Application failed with exception.
org.apache.spark.SparkException: Failed to get main class in JAR with error 
'File spark-internal does not exist'.  Please specify one with --class.
at org.apache.spark.deploy.SparkSubmit.error(SparkSubmit.scala:942)
at 
org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:457)
at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:877)
at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at 
org.apache.spark.deploy.InProcessSparkSubmit$.main(SparkSubmit.scala:954)
at org.apache.spark.deploy.InProcessSparkSubmit.main(SparkSubmit.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.launcher.InProcessAppHandle.lambda$start$0(InProcessAppHandle.java:72)
at java.lang.Thread.run(Thread.java:748)


org.apache.spark.launcher.SparkLauncherSuite#testSparkLauncherGetError is the 
failing test, so I improved the the failing assertion by changing it from:

  
assertTrue(handle.getError().get().getMessage().contains(EXCEPTION_MESSAGE));

to:

  assertThat(handle.getError().get().getMessage(), 
containsString(EXCEPTION_MESSAGE));

This yields:

[ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.155 s 
<<< FAILURE! - in org.apache.spark.launcher.SparkLauncherSuite
[ERROR] testSparkLauncherGetError(org.apache.spark.launcher.SparkLauncherSuite) 
 Time elapsed: 2.02 s  <<< FAILURE!
java.lang.AssertionError:

Expected: a string containing "dummy-exception"
 but: was "Error: Failed to load class 
org.apache.spark.launcher.SparkLauncherSuite$ErrorInProcessTestApp."
at 
org.apache.spark.launcher.SparkLauncherSuite.testSparkLauncherGetError(SparkLauncherSuite.java:276)


Which loosely correlates with error in unit-tests.log.

Any ideas?

Thanks,

Steve C



This email contains confidential information of and is the copyright of 
Infomedia. It must not be forwarded, amended or disclosed without consent of 
the sender. If you received this message by mistake, please advise the sender 
and delete all copies. Security of transmission on the internet cannot be 
guaranteed, could be infected, intercepted, or corrupted and you should ensure 
you have suitable antivirus protection in place. By sending us your or any 
third party personal details, you consent to (or confirm you have obtained 
consent from such third parties) to Infomedia's privacy policy. 
http://www.infomedia.com.au/privacy-policy/


Test Failures on macOS 10.15.4

2020-09-23 Thread EveLiao
Hi,

I am new in the development of Spark. When I tried to run unit tests locally
on macOS 10.15.4, everything went smoothly except a single testcase -
SPARK-6330 regression test. After a few hours struggling with it, I moved to
Linux and it passed magically. My OS is Ubuntu 18.0.4. 

Digging into the code, I believe the intention of the test is to validate
that the distributed filesystem's schema is interpreted from the file path
if no default filesystem provided. And it should avoid the exception: 
"IllegalArgumentException: Wrong FS: hdfs://..., expected: file:///".
Instead, the code goes further and meets errors like "UnknownHostException"
when connecting to the remote system as it is a fake file path. However, the
test on my local environment broke because it throws another exception: 
"java.lang.IllegalArgumentException: Pathname  from hdfs://nonexistent is
not a valid DFS filename." when connecting to the remote.

The code is the below:

  test("SPARK-6330 regression test") {
// In 1.3.0, save to fs other than file: without configuring
core-site.xml would get:
// IllegalArgumentException: Wrong FS: hdfs://..., expected: file:///
intercept[Throwable] {
  spark.read.parquet("file:///nonexistent")
}
val errorMessage = intercept[Throwable] {
  spark.read.parquet("hdfs://nonexistent")
}.toString
assert(errorMessage.contains("UnknownHostException"))
  }

I am wondering if anyone has seen the same broken test before. If so, what
tweaks did you do to make it pass? It could be something I missed when
setting up my local environment. I was using Hadoop3.2 and Hive2.3.

If it is due to the discrepancy of OS systems, does it make sense to make a
change to the test case to help local development? Though we have Jenkins,
we may still need to run tests locally sometimes. My proposals would be:

1. assert(!errorMessage.contains("Wrong FS"))
The risk is later version of Hadoop might change the content of the error
message.

2. assert(errorMessage.contains("UnknownHostException") ||
errorMessage.contains("not a valid DFS filename"))

Any suggestions would be really appreciated. Thanks for your time!



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: SQL test failures in PR builder?

2019-12-09 Thread Shane Knapp
yeah, totally weird.

i'm actually going to take this moment and clean up the build scripts
for both of these jobs.  there's a lot of years-old cruft that i'll
delete and make things more readable.

On Sun, Dec 8, 2019 at 7:50 PM Sean Owen  wrote:
>
> Hm, so they look pretty similar except for minor differences in the
> actual script run. Is there any reason this should be different? Would
> it be reasonable to try making the 'new' one work like the 'old' one
> if the former isn't working?
>
> But I still can't figure out why it causes the same odd error every
> time on this one PR, which is a minor change to tooltips in the UI. I
> haven't seen other manually-triggered PR builds fail this way. Really
> mysterious so far!
>
> https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4964/testReport/
>
>
> Old:
>
> #!/bin/bash
>
> set -e  # fail on any non-zero exit code
> set -x
>
> export AMPLAB_JENKINS=1
> export PATH="$PATH:/home/anaconda/envs/py3k/bin"
>
> # Prepend JAVA_HOME/bin to fix issue where Zinc's embedded SBT
> incremental compiler seems to
> # ignore our JAVA_HOME and use the system javac instead.
> export PATH="$JAVA_HOME/bin:$PATH"
>
> # Add a pre-downloaded version of Maven to the path so that we avoid
> the flaky download step.
> export 
> PATH="/home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.3.9/bin/:$PATH"
>
> echo "fixing target dir permissions"
> chmod -R +w target/* || true  # stupid hack by sknapp to ensure that
> the chmod always exits w/0 and doesn't bork the script
>
> echo "running git clean -fdx"
> git clean -fdx
>
> # Configure per-build-executor Ivy caches to avoid SBT Ivy lock contention
> export HOME="/home/sparkivy/per-executor-caches/$EXECUTOR_NUMBER"
> mkdir -p "$HOME"
> export SBT_OPTS="-Duser.home=$HOME -Dsbt.ivy.home=$HOME/.ivy2"
> export SPARK_VERSIONS_SUITE_IVY_PATH="$HOME/.ivy2"
>
>
> ./dev/run-tests-jenkins
>
>
> # Hack to ensure that at least one JVM suite always runs in order to
> prevent spurious errors from the
> # Jenkins JUnit test reporter plugin
> ./build/sbt unsafe/test > /dev/null 2>&1
>
>
>
> New:
>
> #!/bin/bash
>
> set -e
> export AMPLAB_JENKINS=1
> export PATH="$PATH:/home/anaconda/envs/py3k/bin"
> git clean -fdx
>
> # Prepend JAVA_HOME/bin to fix issue where Zinc's embedded SBT
> incremental compiler seems to
> # ignore our JAVA_HOME and use the system javac instead.
> export PATH="$JAVA_HOME/bin:$PATH"
>
> # Add a pre-downloaded version of Maven to the path so that we avoid
> the flaky download step.
> export 
> PATH="/home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.3.9/bin/:$PATH"
>
> # Configure per-build-executor Ivy caches to avoid SBT Ivy lock contention
> export HOME="/home/sparkivy/per-executor-caches/$EXECUTOR_NUMBER"
> mkdir -p "$HOME"
> export SBT_OPTS="-Duser.home=$HOME -Dsbt.ivy.home=$HOME/.ivy2"
> export SPARK_VERSIONS_SUITE_IVY_PATH="$HOME/.ivy2"
>
> # This is required for tests of backport patches.
> # We need to download the run-tests-codes.sh file because it's
> imported by run-tests-jenkins.
> # When running tests on branch-1.0 (and earlier), the older version of
> run-tests won't set CURRENT_BLOCK, so
> # the Jenkins scripts will report all failures as "some tests failed"
> rather than a more specific
> # error message.
> if [ ! -f "dev/run-tests-jenkins" ]; then
>   wget 
> https://raw.githubusercontent.com/apache/spark/master/dev/run-tests-jenkins
>   wget 
> https://raw.githubusercontent.com/apache/spark/master/dev/run-tests-codes.sh
>   mv run-tests-jenkins dev/
>   mv run-tests-codes.sh dev/
>   chmod 755 dev/run-tests-jenkins
>   chmod 755 dev/run-tests-codes.sh
> fi
>
> ./dev/run-tests-jenkins
>
>
> On Wed, Dec 4, 2019 at 5:53 PM Shane Knapp  wrote:
> >
> > ++yin huai for more insight in to the NewSparkPullRequestBuilder job...
> >
> > tbh, i never (or still) really understand the exact use for that job,
> > except that it's triggered by https://spark-prs.appspot.com/
> >
> > shane
> >
> >
> > On Wed, Dec 4, 2019 at 3:34 PM Sean Owen  wrote:
> > >
> > > BTW does anyone know why there are two PR builder jobs? I'm confused
> > > about why different ones would execute.
> > >
> > > Yes I see NewSparkPullRequestBuilder failing on a variety of PRs.
> > > I don't think it has anyth

Re: SQL test failures in PR builder?

2019-12-08 Thread Sean Owen
Hm, so they look pretty similar except for minor differences in the
actual script run. Is there any reason this should be different? Would
it be reasonable to try making the 'new' one work like the 'old' one
if the former isn't working?

But I still can't figure out why it causes the same odd error every
time on this one PR, which is a minor change to tooltips in the UI. I
haven't seen other manually-triggered PR builds fail this way. Really
mysterious so far!

https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4964/testReport/


Old:

#!/bin/bash

set -e  # fail on any non-zero exit code
set -x

export AMPLAB_JENKINS=1
export PATH="$PATH:/home/anaconda/envs/py3k/bin"

# Prepend JAVA_HOME/bin to fix issue where Zinc's embedded SBT
incremental compiler seems to
# ignore our JAVA_HOME and use the system javac instead.
export PATH="$JAVA_HOME/bin:$PATH"

# Add a pre-downloaded version of Maven to the path so that we avoid
the flaky download step.
export 
PATH="/home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.3.9/bin/:$PATH"

echo "fixing target dir permissions"
chmod -R +w target/* || true  # stupid hack by sknapp to ensure that
the chmod always exits w/0 and doesn't bork the script

echo "running git clean -fdx"
git clean -fdx

# Configure per-build-executor Ivy caches to avoid SBT Ivy lock contention
export HOME="/home/sparkivy/per-executor-caches/$EXECUTOR_NUMBER"
mkdir -p "$HOME"
export SBT_OPTS="-Duser.home=$HOME -Dsbt.ivy.home=$HOME/.ivy2"
export SPARK_VERSIONS_SUITE_IVY_PATH="$HOME/.ivy2"


./dev/run-tests-jenkins


# Hack to ensure that at least one JVM suite always runs in order to
prevent spurious errors from the
# Jenkins JUnit test reporter plugin
./build/sbt unsafe/test > /dev/null 2>&1



New:

#!/bin/bash

set -e
export AMPLAB_JENKINS=1
export PATH="$PATH:/home/anaconda/envs/py3k/bin"
git clean -fdx

# Prepend JAVA_HOME/bin to fix issue where Zinc's embedded SBT
incremental compiler seems to
# ignore our JAVA_HOME and use the system javac instead.
export PATH="$JAVA_HOME/bin:$PATH"

# Add a pre-downloaded version of Maven to the path so that we avoid
the flaky download step.
export 
PATH="/home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.3.9/bin/:$PATH"

# Configure per-build-executor Ivy caches to avoid SBT Ivy lock contention
export HOME="/home/sparkivy/per-executor-caches/$EXECUTOR_NUMBER"
mkdir -p "$HOME"
export SBT_OPTS="-Duser.home=$HOME -Dsbt.ivy.home=$HOME/.ivy2"
export SPARK_VERSIONS_SUITE_IVY_PATH="$HOME/.ivy2"

# This is required for tests of backport patches.
# We need to download the run-tests-codes.sh file because it's
imported by run-tests-jenkins.
# When running tests on branch-1.0 (and earlier), the older version of
run-tests won't set CURRENT_BLOCK, so
# the Jenkins scripts will report all failures as "some tests failed"
rather than a more specific
# error message.
if [ ! -f "dev/run-tests-jenkins" ]; then
  wget 
https://raw.githubusercontent.com/apache/spark/master/dev/run-tests-jenkins
  wget 
https://raw.githubusercontent.com/apache/spark/master/dev/run-tests-codes.sh
  mv run-tests-jenkins dev/
  mv run-tests-codes.sh dev/
  chmod 755 dev/run-tests-jenkins
  chmod 755 dev/run-tests-codes.sh
fi

./dev/run-tests-jenkins


On Wed, Dec 4, 2019 at 5:53 PM Shane Knapp  wrote:
>
> ++yin huai for more insight in to the NewSparkPullRequestBuilder job...
>
> tbh, i never (or still) really understand the exact use for that job,
> except that it's triggered by https://spark-prs.appspot.com/
>
> shane
>
>
> On Wed, Dec 4, 2019 at 3:34 PM Sean Owen  wrote:
> >
> > BTW does anyone know why there are two PR builder jobs? I'm confused
> > about why different ones would execute.
> >
> > Yes I see NewSparkPullRequestBuilder failing on a variety of PRs.
> > I don't think it has anything to do with Hive; these PRs touch
> > different parts of code but all not related to this failure.
> >
> > On Wed, Dec 4, 2019 at 12:40 PM Dongjoon Hyun  
> > wrote:
> > >
> > > Hi, Sean.
> > >
> > > It seems that there is no failure on your other SQL PR.
> > >
> > > https://github.com/apache/spark/pull/26748
> > >
> > > Does the sequential failure happen only at `NewSparkPullRequestBuilder`?
> > > Since `NewSparkPullRequestBuilder` is not the same with 
> > > `SparkPullRequestBuilder`,
> > > there might be a root cause inside it if it happens only at 
> > > `NewSparkPullRequestBuilder`.
> > >
> > > For `org.apache.hive.service.ServiceException: Failed to Start 
> > > Hive

Re: SQL test failures in PR builder?

2019-12-04 Thread Shane Knapp
++yin huai for more insight in to the NewSparkPullRequestBuilder job...

tbh, i never (or still) really understand the exact use for that job,
except that it's triggered by https://spark-prs.appspot.com/

shane


On Wed, Dec 4, 2019 at 3:34 PM Sean Owen  wrote:
>
> BTW does anyone know why there are two PR builder jobs? I'm confused
> about why different ones would execute.
>
> Yes I see NewSparkPullRequestBuilder failing on a variety of PRs.
> I don't think it has anything to do with Hive; these PRs touch
> different parts of code but all not related to this failure.
>
> On Wed, Dec 4, 2019 at 12:40 PM Dongjoon Hyun  wrote:
> >
> > Hi, Sean.
> >
> > It seems that there is no failure on your other SQL PR.
> >
> > https://github.com/apache/spark/pull/26748
> >
> > Does the sequential failure happen only at `NewSparkPullRequestBuilder`?
> > Since `NewSparkPullRequestBuilder` is not the same with 
> > `SparkPullRequestBuilder`,
> > there might be a root cause inside it if it happens only at 
> > `NewSparkPullRequestBuilder`.
> >
> > For `org.apache.hive.service.ServiceException: Failed to Start HiveServer2`,
> > I've observed them before, but the root cause might be different from this 
> > one.
> >
> > BTW, to reduce the scope of investigation, could you try with `[hive-1.2]` 
> > tag in your PR?
> >
> > Bests,
> > Dongjoon.
> >
> >
> > On Wed, Dec 4, 2019 at 6:29 AM Sean Owen  wrote:
> >>
> >> I'm seeing consistent failures in the PR builder when touching SQL code:
> >>
> >> https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4960/testReport/
> >>
> >>  
> >> org.apache.spark.sql.hive.thriftserver.SparkMetadataOperationSuite.Spark's 
> >> own GetSchemasOperation(SparkGetSchemasOperation)14 ms2
> >>  
> >> org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextSuite.(It
> >>  is not a test it is a sbt.testing.SuiteSelector)
> >>
> >> Looks like this has failed about 6 builds in the past few days. Has anyone 
> >> seen this / has a clue what's causing it? errors are like ...
> >>
> >> java.sql.SQLException: No suitable driver found for 
> >> jdbc:hive2://localhost:13694/?a=avalue;b=bvalue#c=cvalue;d=dvalue
> >>
> >>
> >> Caused by: sbt.ForkMain$ForkError: java.lang.RuntimeException: class 
> >> org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl not 
> >> org.apache.hadoop.hive.metastore.MetaStoreFilterHook



--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: SQL test failures in PR builder?

2019-12-04 Thread Sean Owen
BTW does anyone know why there are two PR builder jobs? I'm confused
about why different ones would execute.

Yes I see NewSparkPullRequestBuilder failing on a variety of PRs.
I don't think it has anything to do with Hive; these PRs touch
different parts of code but all not related to this failure.

On Wed, Dec 4, 2019 at 12:40 PM Dongjoon Hyun  wrote:
>
> Hi, Sean.
>
> It seems that there is no failure on your other SQL PR.
>
> https://github.com/apache/spark/pull/26748
>
> Does the sequential failure happen only at `NewSparkPullRequestBuilder`?
> Since `NewSparkPullRequestBuilder` is not the same with 
> `SparkPullRequestBuilder`,
> there might be a root cause inside it if it happens only at 
> `NewSparkPullRequestBuilder`.
>
> For `org.apache.hive.service.ServiceException: Failed to Start HiveServer2`,
> I've observed them before, but the root cause might be different from this 
> one.
>
> BTW, to reduce the scope of investigation, could you try with `[hive-1.2]` 
> tag in your PR?
>
> Bests,
> Dongjoon.
>
>
> On Wed, Dec 4, 2019 at 6:29 AM Sean Owen  wrote:
>>
>> I'm seeing consistent failures in the PR builder when touching SQL code:
>>
>> https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4960/testReport/
>>
>>  org.apache.spark.sql.hive.thriftserver.SparkMetadataOperationSuite.Spark's 
>> own GetSchemasOperation(SparkGetSchemasOperation)14 ms2
>>  
>> org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextSuite.(It 
>> is not a test it is a sbt.testing.SuiteSelector)
>>
>> Looks like this has failed about 6 builds in the past few days. Has anyone 
>> seen this / has a clue what's causing it? errors are like ...
>>
>> java.sql.SQLException: No suitable driver found for 
>> jdbc:hive2://localhost:13694/?a=avalue;b=bvalue#c=cvalue;d=dvalue
>>
>>
>> Caused by: sbt.ForkMain$ForkError: java.lang.RuntimeException: class 
>> org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl not 
>> org.apache.hadoop.hive.metastore.MetaStoreFilterHook

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: SQL test failures in PR builder?

2019-12-04 Thread Dongjoon Hyun
Hi, Sean.

It seems that there is no failure on your other SQL PR.

https://github.com/apache/spark/pull/26748

Does the sequential failure happen only at `NewSparkPullRequestBuilder`?
Since `NewSparkPullRequestBuilder` is not the same with
`SparkPullRequestBuilder`,
there might be a root cause inside it if it happens only at
`NewSparkPullRequestBuilder`.

For `org.apache.hive.service.ServiceException: Failed to Start HiveServer2`,
I've observed them before, but the root cause might be different from this
one.

BTW, to reduce the scope of investigation, could you try with `[hive-1.2]`
tag in your PR?

Bests,
Dongjoon.


On Wed, Dec 4, 2019 at 6:29 AM Sean Owen  wrote:

> I'm seeing consistent failures in the PR builder when touching SQL code:
>
>
> https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4960/testReport/
>
>  org.apache.spark.sql.hive.thriftserver.SparkMetadataOperationSuite.Spark's
> own GetSchemasOperation(SparkGetSchemasOperation)14 ms2
>  org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextSuite.(It
> is not a test it is a sbt.testing.SuiteSelector)
>
> Looks like this has failed about 6 builds in the past few days. Has anyone
> seen this / has a clue what's causing it? errors are like ...
>
> java.sql.SQLException: No suitable driver found for 
> jdbc:hive2://localhost:13694/?a=avalue;b=bvalue#c=cvalue;d=dvalue
>
>
> Caused by: sbt.ForkMain$ForkError: java.lang.RuntimeException: class 
> org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl not 
> org.apache.hadoop.hive.metastore.MetaStoreFilterHook
>
>


SQL test failures in PR builder?

2019-12-04 Thread Sean Owen
I'm seeing consistent failures in the PR builder when touching SQL code:

https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4960/testReport/

 org.apache.spark.sql.hive.thriftserver.SparkMetadataOperationSuite.Spark's
own GetSchemasOperation(SparkGetSchemasOperation)14 ms2
 org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextSuite.(It
is not a test it is a sbt.testing.SuiteSelector)

Looks like this has failed about 6 builds in the past few days. Has anyone
seen this / has a clue what's causing it? errors are like ...

java.sql.SQLException: No suitable driver found for
jdbc:hive2://localhost:13694/?a=avalue;b=bvalue#c=cvalue;d=dvalue


Caused by: sbt.ForkMain$ForkError: java.lang.RuntimeException: class
org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl not
org.apache.hadoop.hive.metastore.MetaStoreFilterHook


Re: Adding JIRA ID as the prefix for the test case name

2019-11-21 Thread Hyukjin Kwon
I opened a PR - https://github.com/apache/spark-website/pull/232

2019년 11월 19일 (화) 오전 9:22, Hyukjin Kwon 님이 작성:

> Let me document as below in few days:
>
> 1. For Python and Java, write a single comment that starts with JIRA ID
> and short description, e.g. (SPARK-X: test blah blah)
> 2. For R, use JIRA ID as a prefix for its test name.
>
> assuming everybody is happy.
>
> 2019년 11월 18일 (월) 오전 11:36, Hyukjin Kwon 님이 작성:
>
>> Actually there are not so many Java test cases in Spark (because Scala
>> runs on JVM as everybody knows)[1].
>>
>> Given that, I think we can avoid to put some efforts on this for now .. I
>> don't mind if somebody wants to give a shot since it looks good anyway but
>> to me I wouldn't spend so much time on this ..
>>
>> Let me just go ahead as I suggested if you don't mind. Anyone can give a
>> shot for Display Name - I'm willing to actively review and help.
>>
>> [1]
>> git ls-files '*Suite.java' | wc -l
>>  172
>> git ls-files '*Suite.scala' | wc -l
>> 1161
>>
>> 2019년 11월 18일 (월) 오전 3:27, Steve Loughran 님이 작성:
>>
>>> Test reporters do often contain some assumptions about the characters in
>>> the test methods. Historically JUnit XML reporters have never sanitised the
>>> method names so XML injection attacks have been fairly trivial. Haven't
>>> tried this for a while.
>>>
>>> That whole JUnit XML report "standard" was actually put together in the
>>> Ant project with  doing the postprocessing of the JUnit run.
>>> It was driven by the team's XSL skills than any overreaching strategic goal
>>> about how to present test results of tests which could run for hours and
>>> whose output you would really want to aggregate the locks from multiple
>>> machines and processes and present in awake you can actually navigate. With
>>> hindsight, a key failing is that we chose to store the test summaries (test
>>> count, failure count...) as attributes on the root XML mode. Which is why
>>> the whole DOM gets built up in the JUnit runner. Which is why when that
>>> JUnit process crashes, you get no report at all.
>>>
>>> It'd be straightforward to fix -except too much relies on that file
>>> now...important things will break. And the maven runner has historically
>>> never supported custom reporters, to let you experiment with it.
>>>
>>> Maybe this is an opportunity to change things.
>>>
>>> On Sun, Nov 17, 2019 at 1:42 AM Hyukjin Kwon 
>>> wrote:
>>>
>>>> DisplayName looks good in general but actually here I would like first
>>>> to find a existing pattern to document in guidelines given the actual
>>>> existing practice we all are used to. I'm trying to be very conservative
>>>> since this guidelines affect everybody.
>>>>
>>>> I think it might be better to discuss separately if we want to change
>>>> what we have been used to.
>>>>
>>>> Also, using arbitrary names might not be actually free due to such bug
>>>> like https://github.com/apache/spark/pull/25630 . It will need some
>>>> more efforts to investigate as well.
>>>>
>>>> On Fri, 15 Nov 2019, 20:56 Steve Loughran, 
>>>> wrote:
>>>>
>>>>>  Junit5: Display names.
>>>>>
>>>>> Goes all the way to the XML.
>>>>>
>>>>>
>>>>> https://junit.org/junit5/docs/current/user-guide/#writing-tests-display-names
>>>>>
>>>>> On Thu, Nov 14, 2019 at 6:13 PM Shixiong(Ryan) Zhu <
>>>>> shixi...@databricks.com> wrote:
>>>>>
>>>>>> Should we also add a guideline for non Scala tests? Other languages
>>>>>> (Java, Python, R) don't support using string as a test name.
>>>>>>
>>>>>> Best Regards,
>>>>>> Ryan
>>>>>>
>>>>>>
>>>>>> On Thu, Nov 14, 2019 at 4:04 AM Hyukjin Kwon 
>>>>>> wrote:
>>>>>>
>>>>>>> I opened a PR - https://github.com/apache/spark-website/pull/231
>>>>>>>
>>>>>>> 2019년 11월 13일 (수) 오전 10:43, Hyukjin Kwon 님이 작성:
>>>>>>>
>>>>>>>> > In general a test should be self descriptive and I don't think we
>>>>>>>> should be adding JIRA ticket references wholesal

Re: Adding JIRA ID as the prefix for the test case name

2019-11-18 Thread Hyukjin Kwon
Let me document as below in few days:

1. For Python and Java, write a single comment that starts with JIRA ID and
short description, e.g. (SPARK-X: test blah blah)
2. For R, use JIRA ID as a prefix for its test name.

assuming everybody is happy.

2019년 11월 18일 (월) 오전 11:36, Hyukjin Kwon 님이 작성:

> Actually there are not so many Java test cases in Spark (because Scala
> runs on JVM as everybody knows)[1].
>
> Given that, I think we can avoid to put some efforts on this for now .. I
> don't mind if somebody wants to give a shot since it looks good anyway but
> to me I wouldn't spend so much time on this ..
>
> Let me just go ahead as I suggested if you don't mind. Anyone can give a
> shot for Display Name - I'm willing to actively review and help.
>
> [1]
> git ls-files '*Suite.java' | wc -l
>  172
> git ls-files '*Suite.scala' | wc -l
> 1161
>
> 2019년 11월 18일 (월) 오전 3:27, Steve Loughran 님이 작성:
>
>> Test reporters do often contain some assumptions about the characters in
>> the test methods. Historically JUnit XML reporters have never sanitised the
>> method names so XML injection attacks have been fairly trivial. Haven't
>> tried this for a while.
>>
>> That whole JUnit XML report "standard" was actually put together in the
>> Ant project with  doing the postprocessing of the JUnit run.
>> It was driven by the team's XSL skills than any overreaching strategic goal
>> about how to present test results of tests which could run for hours and
>> whose output you would really want to aggregate the locks from multiple
>> machines and processes and present in awake you can actually navigate. With
>> hindsight, a key failing is that we chose to store the test summaries (test
>> count, failure count...) as attributes on the root XML mode. Which is why
>> the whole DOM gets built up in the JUnit runner. Which is why when that
>> JUnit process crashes, you get no report at all.
>>
>> It'd be straightforward to fix -except too much relies on that file
>> now...important things will break. And the maven runner has historically
>> never supported custom reporters, to let you experiment with it.
>>
>> Maybe this is an opportunity to change things.
>>
>> On Sun, Nov 17, 2019 at 1:42 AM Hyukjin Kwon  wrote:
>>
>>> DisplayName looks good in general but actually here I would like first
>>> to find a existing pattern to document in guidelines given the actual
>>> existing practice we all are used to. I'm trying to be very conservative
>>> since this guidelines affect everybody.
>>>
>>> I think it might be better to discuss separately if we want to change
>>> what we have been used to.
>>>
>>> Also, using arbitrary names might not be actually free due to such bug
>>> like https://github.com/apache/spark/pull/25630 . It will need some
>>> more efforts to investigate as well.
>>>
>>> On Fri, 15 Nov 2019, 20:56 Steve Loughran, 
>>> wrote:
>>>
>>>>  Junit5: Display names.
>>>>
>>>> Goes all the way to the XML.
>>>>
>>>>
>>>> https://junit.org/junit5/docs/current/user-guide/#writing-tests-display-names
>>>>
>>>> On Thu, Nov 14, 2019 at 6:13 PM Shixiong(Ryan) Zhu <
>>>> shixi...@databricks.com> wrote:
>>>>
>>>>> Should we also add a guideline for non Scala tests? Other languages
>>>>> (Java, Python, R) don't support using string as a test name.
>>>>>
>>>>> Best Regards,
>>>>> Ryan
>>>>>
>>>>>
>>>>> On Thu, Nov 14, 2019 at 4:04 AM Hyukjin Kwon 
>>>>> wrote:
>>>>>
>>>>>> I opened a PR - https://github.com/apache/spark-website/pull/231
>>>>>>
>>>>>> 2019년 11월 13일 (수) 오전 10:43, Hyukjin Kwon 님이 작성:
>>>>>>
>>>>>>> > In general a test should be self descriptive and I don't think we
>>>>>>> should be adding JIRA ticket references wholesale. Any action that the
>>>>>>> reader has to take to understand why a test was introduced is one too 
>>>>>>> many.
>>>>>>> However in some cases the thing we are trying to test is very subtle 
>>>>>>> and in
>>>>>>> that case a reference to a JIRA ticket might be useful, I do still feel
>>>>>>> that this should be a backstop and that properly documenting your tests 
>&g

Re: Adding JIRA ID as the prefix for the test case name

2019-11-17 Thread Hyukjin Kwon
Actually there are not so many Java test cases in Spark (because Scala runs
on JVM as everybody knows)[1].

Given that, I think we can avoid to put some efforts on this for now .. I
don't mind if somebody wants to give a shot since it looks good anyway but
to me I wouldn't spend so much time on this ..

Let me just go ahead as I suggested if you don't mind. Anyone can give a
shot for Display Name - I'm willing to actively review and help.

[1]
git ls-files '*Suite.java' | wc -l
 172
git ls-files '*Suite.scala' | wc -l
1161

2019년 11월 18일 (월) 오전 3:27, Steve Loughran 님이 작성:

> Test reporters do often contain some assumptions about the characters in
> the test methods. Historically JUnit XML reporters have never sanitised the
> method names so XML injection attacks have been fairly trivial. Haven't
> tried this for a while.
>
> That whole JUnit XML report "standard" was actually put together in the
> Ant project with  doing the postprocessing of the JUnit run.
> It was driven by the team's XSL skills than any overreaching strategic goal
> about how to present test results of tests which could run for hours and
> whose output you would really want to aggregate the locks from multiple
> machines and processes and present in awake you can actually navigate. With
> hindsight, a key failing is that we chose to store the test summaries (test
> count, failure count...) as attributes on the root XML mode. Which is why
> the whole DOM gets built up in the JUnit runner. Which is why when that
> JUnit process crashes, you get no report at all.
>
> It'd be straightforward to fix -except too much relies on that file
> now...important things will break. And the maven runner has historically
> never supported custom reporters, to let you experiment with it.
>
> Maybe this is an opportunity to change things.
>
> On Sun, Nov 17, 2019 at 1:42 AM Hyukjin Kwon  wrote:
>
>> DisplayName looks good in general but actually here I would like first to
>> find a existing pattern to document in guidelines given the actual existing
>> practice we all are used to. I'm trying to be very conservative since this
>> guidelines affect everybody.
>>
>> I think it might be better to discuss separately if we want to change
>> what we have been used to.
>>
>> Also, using arbitrary names might not be actually free due to such bug
>> like https://github.com/apache/spark/pull/25630 . It will need some more
>> efforts to investigate as well.
>>
>> On Fri, 15 Nov 2019, 20:56 Steve Loughran, 
>> wrote:
>>
>>>  Junit5: Display names.
>>>
>>> Goes all the way to the XML.
>>>
>>>
>>> https://junit.org/junit5/docs/current/user-guide/#writing-tests-display-names
>>>
>>> On Thu, Nov 14, 2019 at 6:13 PM Shixiong(Ryan) Zhu <
>>> shixi...@databricks.com> wrote:
>>>
>>>> Should we also add a guideline for non Scala tests? Other languages
>>>> (Java, Python, R) don't support using string as a test name.
>>>>
>>>> Best Regards,
>>>> Ryan
>>>>
>>>>
>>>> On Thu, Nov 14, 2019 at 4:04 AM Hyukjin Kwon 
>>>> wrote:
>>>>
>>>>> I opened a PR - https://github.com/apache/spark-website/pull/231
>>>>>
>>>>> 2019년 11월 13일 (수) 오전 10:43, Hyukjin Kwon 님이 작성:
>>>>>
>>>>>> > In general a test should be self descriptive and I don't think we
>>>>>> should be adding JIRA ticket references wholesale. Any action that the
>>>>>> reader has to take to understand why a test was introduced is one too 
>>>>>> many.
>>>>>> However in some cases the thing we are trying to test is very subtle and 
>>>>>> in
>>>>>> that case a reference to a JIRA ticket might be useful, I do still feel
>>>>>> that this should be a backstop and that properly documenting your tests 
>>>>>> is
>>>>>> a much better way of dealing with this.
>>>>>>
>>>>>> Yeah, the test should be self-descriptive. I don't think adding a
>>>>>> JIRA prefix harms this point. Probably I should add this sentence in the
>>>>>> guidelines as well.
>>>>>> Adding a JIRA prefix just adds one extra hint to track down details.
>>>>>> I think it's fine to stick to this practice and make it simpler and clear
>>>>>> to follow.
>>>>>>
>>>>>> > 1. what if multiple JIRA ID

Re: Adding JIRA ID as the prefix for the test case name

2019-11-17 Thread Steve Loughran
Test reporters do often contain some assumptions about the characters in
the test methods. Historically JUnit XML reporters have never sanitised the
method names so XML injection attacks have been fairly trivial. Haven't
tried this for a while.

That whole JUnit XML report "standard" was actually put together in the Ant
project with  doing the postprocessing of the JUnit run. It
was driven by the team's XSL skills than any overreaching strategic goal
about how to present test results of tests which could run for hours and
whose output you would really want to aggregate the locks from multiple
machines and processes and present in awake you can actually navigate. With
hindsight, a key failing is that we chose to store the test summaries (test
count, failure count...) as attributes on the root XML mode. Which is why
the whole DOM gets built up in the JUnit runner. Which is why when that
JUnit process crashes, you get no report at all.

It'd be straightforward to fix -except too much relies on that file
now...important things will break. And the maven runner has historically
never supported custom reporters, to let you experiment with it.

Maybe this is an opportunity to change things.

On Sun, Nov 17, 2019 at 1:42 AM Hyukjin Kwon  wrote:

> DisplayName looks good in general but actually here I would like first to
> find a existing pattern to document in guidelines given the actual existing
> practice we all are used to. I'm trying to be very conservative since this
> guidelines affect everybody.
>
> I think it might be better to discuss separately if we want to change what
> we have been used to.
>
> Also, using arbitrary names might not be actually free due to such bug
> like https://github.com/apache/spark/pull/25630 . It will need some more
> efforts to investigate as well.
>
> On Fri, 15 Nov 2019, 20:56 Steve Loughran, 
> wrote:
>
>>  Junit5: Display names.
>>
>> Goes all the way to the XML.
>>
>>
>> https://junit.org/junit5/docs/current/user-guide/#writing-tests-display-names
>>
>> On Thu, Nov 14, 2019 at 6:13 PM Shixiong(Ryan) Zhu <
>> shixi...@databricks.com> wrote:
>>
>>> Should we also add a guideline for non Scala tests? Other languages
>>> (Java, Python, R) don't support using string as a test name.
>>>
>>> Best Regards,
>>> Ryan
>>>
>>>
>>> On Thu, Nov 14, 2019 at 4:04 AM Hyukjin Kwon 
>>> wrote:
>>>
>>>> I opened a PR - https://github.com/apache/spark-website/pull/231
>>>>
>>>> 2019년 11월 13일 (수) 오전 10:43, Hyukjin Kwon 님이 작성:
>>>>
>>>>> > In general a test should be self descriptive and I don't think we
>>>>> should be adding JIRA ticket references wholesale. Any action that the
>>>>> reader has to take to understand why a test was introduced is one too 
>>>>> many.
>>>>> However in some cases the thing we are trying to test is very subtle and 
>>>>> in
>>>>> that case a reference to a JIRA ticket might be useful, I do still feel
>>>>> that this should be a backstop and that properly documenting your tests is
>>>>> a much better way of dealing with this.
>>>>>
>>>>> Yeah, the test should be self-descriptive. I don't think adding a JIRA
>>>>> prefix harms this point. Probably I should add this sentence in the
>>>>> guidelines as well.
>>>>> Adding a JIRA prefix just adds one extra hint to track down details. I
>>>>> think it's fine to stick to this practice and make it simpler and clear to
>>>>> follow.
>>>>>
>>>>> > 1. what if multiple JIRA IDs relating to the same test? we just take
>>>>> the very first JIRA ID?
>>>>> Ideally one JIRA should describe one issue and one PR should fix one
>>>>> JIRA with a dedicated test.
>>>>> Yeah, I think I would take the very first JIRA ID.
>>>>>
>>>>> > 2. are we going to have a full scan of all existing tests and attach
>>>>> a JIRA ID to it?
>>>>> Yea, let's don't do this.
>>>>>
>>>>> > It's a nice-to-have, not super essential, just because ...
>>>>> It's been asked multiple times and each committer seems having a
>>>>> different understanding on this.
>>>>> It's not a biggie but wanted to make it clear and conclude this.
>>>>>
>>>>> > I'd add this only when a test specifically targets a certain issue.
>>>>> Yes, so t

Re: Adding JIRA ID as the prefix for the test case name

2019-11-16 Thread Hyukjin Kwon
DisplayName looks good in general but actually here I would like first to
find a existing pattern to document in guidelines given the actual existing
practice we all are used to. I'm trying to be very conservative since this
guidelines affect everybody.

I think it might be better to discuss separately if we want to change what
we have been used to.

Also, using arbitrary names might not be actually free due to such bug like
https://github.com/apache/spark/pull/25630 . It will need some more efforts
to investigate as well.

On Fri, 15 Nov 2019, 20:56 Steve Loughran, 
wrote:

>  Junit5: Display names.
>
> Goes all the way to the XML.
>
>
> https://junit.org/junit5/docs/current/user-guide/#writing-tests-display-names
>
> On Thu, Nov 14, 2019 at 6:13 PM Shixiong(Ryan) Zhu <
> shixi...@databricks.com> wrote:
>
>> Should we also add a guideline for non Scala tests? Other languages
>> (Java, Python, R) don't support using string as a test name.
>>
>> Best Regards,
>> Ryan
>>
>>
>> On Thu, Nov 14, 2019 at 4:04 AM Hyukjin Kwon  wrote:
>>
>>> I opened a PR - https://github.com/apache/spark-website/pull/231
>>>
>>> 2019년 11월 13일 (수) 오전 10:43, Hyukjin Kwon 님이 작성:
>>>
>>>> > In general a test should be self descriptive and I don't think we
>>>> should be adding JIRA ticket references wholesale. Any action that the
>>>> reader has to take to understand why a test was introduced is one too many.
>>>> However in some cases the thing we are trying to test is very subtle and in
>>>> that case a reference to a JIRA ticket might be useful, I do still feel
>>>> that this should be a backstop and that properly documenting your tests is
>>>> a much better way of dealing with this.
>>>>
>>>> Yeah, the test should be self-descriptive. I don't think adding a JIRA
>>>> prefix harms this point. Probably I should add this sentence in the
>>>> guidelines as well.
>>>> Adding a JIRA prefix just adds one extra hint to track down details. I
>>>> think it's fine to stick to this practice and make it simpler and clear to
>>>> follow.
>>>>
>>>> > 1. what if multiple JIRA IDs relating to the same test? we just take
>>>> the very first JIRA ID?
>>>> Ideally one JIRA should describe one issue and one PR should fix one
>>>> JIRA with a dedicated test.
>>>> Yeah, I think I would take the very first JIRA ID.
>>>>
>>>> > 2. are we going to have a full scan of all existing tests and attach
>>>> a JIRA ID to it?
>>>> Yea, let's don't do this.
>>>>
>>>> > It's a nice-to-have, not super essential, just because ...
>>>> It's been asked multiple times and each committer seems having a
>>>> different understanding on this.
>>>> It's not a biggie but wanted to make it clear and conclude this.
>>>>
>>>> > I'd add this only when a test specifically targets a certain issue.
>>>> Yes, so this one I am not sure. From what I heard, people adds the JIRA
>>>> in cases below:
>>>>
>>>> - Whenever the JIRA type is a bug
>>>> - When a PR adds a couple of tests
>>>> - Only when a test specifically targets a certain issue.
>>>> - ...
>>>>
>>>> Which one do we prefer and simpler to follow?
>>>>
>>>> Or I can combine as below (im gonna reword when I actually document
>>>> this):
>>>> 1. In general, we should add a JIRA ID as prefix of a test when a PR
>>>> targets to fix a specific issue.
>>>> In practice, it usually happens when a JIRA type is a bug or a PR
>>>> adds a couple of tests.
>>>> 2. Uses "SPARK-: test name" format
>>>>
>>>> If we have no objection with ^, let me go with this.
>>>>
>>>> 2019년 11월 13일 (수) 오전 8:14, Sean Owen 님이 작성:
>>>>
>>>>> Let's suggest "SPARK-12345:" but not go back and change a bunch of
>>>>> test cases.
>>>>> I'd add this only when a test specifically targets a certain issue.
>>>>> It's a nice-to-have, not super essential, just because in the rare
>>>>> case you need to understand why a test asserts something, you can go
>>>>> back and find what added it in the git history without much trouble.
>>>>>
>>>>> On Mon, Nov 11, 2019 at 10:46 AM Hyukjin Kwon 
>>

Re: Adding JIRA ID as the prefix for the test case name

2019-11-15 Thread Steve Loughran
 Junit5: Display names.

Goes all the way to the XML.

https://junit.org/junit5/docs/current/user-guide/#writing-tests-display-names

On Thu, Nov 14, 2019 at 6:13 PM Shixiong(Ryan) Zhu 
wrote:

> Should we also add a guideline for non Scala tests? Other languages (Java,
> Python, R) don't support using string as a test name.
>
> Best Regards,
> Ryan
>
>
> On Thu, Nov 14, 2019 at 4:04 AM Hyukjin Kwon  wrote:
>
>> I opened a PR - https://github.com/apache/spark-website/pull/231
>>
>> 2019년 11월 13일 (수) 오전 10:43, Hyukjin Kwon 님이 작성:
>>
>>> > In general a test should be self descriptive and I don't think we
>>> should be adding JIRA ticket references wholesale. Any action that the
>>> reader has to take to understand why a test was introduced is one too many.
>>> However in some cases the thing we are trying to test is very subtle and in
>>> that case a reference to a JIRA ticket might be useful, I do still feel
>>> that this should be a backstop and that properly documenting your tests is
>>> a much better way of dealing with this.
>>>
>>> Yeah, the test should be self-descriptive. I don't think adding a JIRA
>>> prefix harms this point. Probably I should add this sentence in the
>>> guidelines as well.
>>> Adding a JIRA prefix just adds one extra hint to track down details. I
>>> think it's fine to stick to this practice and make it simpler and clear to
>>> follow.
>>>
>>> > 1. what if multiple JIRA IDs relating to the same test? we just take
>>> the very first JIRA ID?
>>> Ideally one JIRA should describe one issue and one PR should fix one
>>> JIRA with a dedicated test.
>>> Yeah, I think I would take the very first JIRA ID.
>>>
>>> > 2. are we going to have a full scan of all existing tests and attach a
>>> JIRA ID to it?
>>> Yea, let's don't do this.
>>>
>>> > It's a nice-to-have, not super essential, just because ...
>>> It's been asked multiple times and each committer seems having a
>>> different understanding on this.
>>> It's not a biggie but wanted to make it clear and conclude this.
>>>
>>> > I'd add this only when a test specifically targets a certain issue.
>>> Yes, so this one I am not sure. From what I heard, people adds the JIRA
>>> in cases below:
>>>
>>> - Whenever the JIRA type is a bug
>>> - When a PR adds a couple of tests
>>> - Only when a test specifically targets a certain issue.
>>> - ...
>>>
>>> Which one do we prefer and simpler to follow?
>>>
>>> Or I can combine as below (im gonna reword when I actually document
>>> this):
>>> 1. In general, we should add a JIRA ID as prefix of a test when a PR
>>> targets to fix a specific issue.
>>>     In practice, it usually happens when a JIRA type is a bug or a PR
>>> adds a couple of tests.
>>> 2. Uses "SPARK-: test name" format
>>>
>>> If we have no objection with ^, let me go with this.
>>>
>>> 2019년 11월 13일 (수) 오전 8:14, Sean Owen 님이 작성:
>>>
>>>> Let's suggest "SPARK-12345:" but not go back and change a bunch of test
>>>> cases.
>>>> I'd add this only when a test specifically targets a certain issue.
>>>> It's a nice-to-have, not super essential, just because in the rare
>>>> case you need to understand why a test asserts something, you can go
>>>> back and find what added it in the git history without much trouble.
>>>>
>>>> On Mon, Nov 11, 2019 at 10:46 AM Hyukjin Kwon 
>>>> wrote:
>>>> >
>>>> > Hi all,
>>>> >
>>>> > Maybe it's not a big deal but it brought some confusions time to time
>>>> into Spark dev and community. I think it's time to discuss about when/which
>>>> format to add a JIRA ID as a prefix for the test case name in Scala test
>>>> cases.
>>>> >
>>>> > Currently we have many test case names with prefixes as below:
>>>> >
>>>> > test("SPARK-X blah blah")
>>>> > test("SPARK-X: blah blah")
>>>> > test("SPARK-X - blah blah")
>>>> > test("[SPARK-X] blah blah")
>>>> > …
>>>> >
>>>> > It is a good practice to have the JIRA ID in general because, for
>>>> instance,
>>>> >

Re: Adding JIRA ID as the prefix for the test case name

2019-11-14 Thread Felix Cheung
this is about test description and not test file name right?

if yes I don’t see a problem.


From: Hyukjin Kwon 
Sent: Thursday, November 14, 2019 6:03:02 PM
To: Shixiong(Ryan) Zhu 
Cc: dev ; Felix Cheung ; 
Shivaram Venkataraman 
Subject: Re: Adding JIRA ID as the prefix for the test case name

Yeah, sounds good to have it.

In case of R, it seems not quite common to write down JIRA ID [1] but looks 
some have the prefix in its test name in general.
In case of Python and Java, seems we time to time write a JIRA ID in the 
comment right under the test method [2][3].

Given this pattern, I would like to suggest use the same format but:

1. For Python and Java, write a single comment that starts with JIRA ID and 
short description, e.g. (SPARK-X: test blah blah)
2. For R, use JIRA ID as a prefix for its test name.

[1] git grep -r "SPARK-" -- '*test*.R'
[2] git grep -r "SPARK-" -- '*Suite.java'
[3] git grep -r "SPARK-" -- '*test*.py'

Does that make sense? Adding Felix and Shivaram too.


2019년 11월 15일 (금) 오전 3:13, Shixiong(Ryan) Zhu 
mailto:shixi...@databricks.com>>님이 작성:
Should we also add a guideline for non Scala tests? Other languages (Java, 
Python, R) don't support using string as a test name.

Best Regards,

Ryan


On Thu, Nov 14, 2019 at 4:04 AM Hyukjin Kwon 
mailto:gurwls...@gmail.com>> wrote:
I opened a PR - https://github.com/apache/spark-website/pull/231

2019년 11월 13일 (수) 오전 10:43, Hyukjin Kwon 
mailto:gurwls...@gmail.com>>님이 작성:
> In general a test should be self descriptive and I don't think we should be 
> adding JIRA ticket references wholesale. Any action that the reader has to 
> take to understand why a test was introduced is one too many. However in some 
> cases the thing we are trying to test is very subtle and in that case a 
> reference to a JIRA ticket might be useful, I do still feel that this should 
> be a backstop and that properly documenting your tests is a much better way 
> of dealing with this.

Yeah, the test should be self-descriptive. I don't think adding a JIRA prefix 
harms this point. Probably I should add this sentence in the guidelines as well.
Adding a JIRA prefix just adds one extra hint to track down details. I think 
it's fine to stick to this practice and make it simpler and clear to follow.

> 1. what if multiple JIRA IDs relating to the same test? we just take the very 
> first JIRA ID?
Ideally one JIRA should describe one issue and one PR should fix one JIRA with 
a dedicated test.
Yeah, I think I would take the very first JIRA ID.

> 2. are we going to have a full scan of all existing tests and attach a JIRA 
> ID to it?
Yea, let's don't do this.

> It's a nice-to-have, not super essential, just because ...
It's been asked multiple times and each committer seems having a different 
understanding on this.
It's not a biggie but wanted to make it clear and conclude this.

> I'd add this only when a test specifically targets a certain issue.
Yes, so this one I am not sure. From what I heard, people adds the JIRA in 
cases below:

- Whenever the JIRA type is a bug
- When a PR adds a couple of tests
- Only when a test specifically targets a certain issue.
- ...

Which one do we prefer and simpler to follow?

Or I can combine as below (im gonna reword when I actually document this):
1. In general, we should add a JIRA ID as prefix of a test when a PR targets to 
fix a specific issue.
In practice, it usually happens when a JIRA type is a bug or a PR adds a 
couple of tests.
2. Uses "SPARK-: test name" format

If we have no objection with ^, let me go with this.

2019년 11월 13일 (수) 오전 8:14, Sean Owen 
mailto:sro...@gmail.com>>님이 작성:
Let's suggest "SPARK-12345:" but not go back and change a bunch of test cases.
I'd add this only when a test specifically targets a certain issue.
It's a nice-to-have, not super essential, just because in the rare
case you need to understand why a test asserts something, you can go
back and find what added it in the git history without much trouble.

On Mon, Nov 11, 2019 at 10:46 AM Hyukjin Kwon 
mailto:gurwls...@gmail.com>> wrote:
>
> Hi all,
>
> Maybe it's not a big deal but it brought some confusions time to time into 
> Spark dev and community. I think it's time to discuss about when/which format 
> to add a JIRA ID as a prefix for the test case name in Scala test cases.
>
> Currently we have many test case names with prefixes as below:
>
> test("SPARK-X blah blah")
> test("SPARK-X: blah blah")
> test("SPARK-X - blah blah")
> test("[SPARK-X] blah blah")
> …
>
> It is a good practice to have the JIRA ID in general because, for instance,
> it makes us put less 

Re: Adding JIRA ID as the prefix for the test case name

2019-11-14 Thread Hyukjin Kwon
Yeah, sounds good to have it.

In case of R, it seems not quite common to write down JIRA ID [1] but looks
some have the prefix in its test name in general.
In case of Python and Java, seems we time to time write a JIRA ID in the
comment right under the test method [2][3].

Given this pattern, I would like to suggest use the same format but:

1. For Python and Java, write a single comment that starts with JIRA ID and
short description, e.g. (SPARK-X: test blah blah)
2. For R, use JIRA ID as a prefix for its test name.

[1] git grep -r "SPARK-" -- '*test*.R'
[2] git grep -r "SPARK-" -- '*Suite.java'
[3] git grep -r "SPARK-" -- '*test*.py'

Does that make sense? Adding Felix and Shivaram too.


2019년 11월 15일 (금) 오전 3:13, Shixiong(Ryan) Zhu 님이
작성:

> Should we also add a guideline for non Scala tests? Other languages (Java,
> Python, R) don't support using string as a test name.
>
> Best Regards,
> Ryan
>
>
> On Thu, Nov 14, 2019 at 4:04 AM Hyukjin Kwon  wrote:
>
>> I opened a PR - https://github.com/apache/spark-website/pull/231
>>
>> 2019년 11월 13일 (수) 오전 10:43, Hyukjin Kwon 님이 작성:
>>
>>> > In general a test should be self descriptive and I don't think we
>>> should be adding JIRA ticket references wholesale. Any action that the
>>> reader has to take to understand why a test was introduced is one too many.
>>> However in some cases the thing we are trying to test is very subtle and in
>>> that case a reference to a JIRA ticket might be useful, I do still feel
>>> that this should be a backstop and that properly documenting your tests is
>>> a much better way of dealing with this.
>>>
>>> Yeah, the test should be self-descriptive. I don't think adding a JIRA
>>> prefix harms this point. Probably I should add this sentence in the
>>> guidelines as well.
>>> Adding a JIRA prefix just adds one extra hint to track down details. I
>>> think it's fine to stick to this practice and make it simpler and clear to
>>> follow.
>>>
>>> > 1. what if multiple JIRA IDs relating to the same test? we just take
>>> the very first JIRA ID?
>>> Ideally one JIRA should describe one issue and one PR should fix one
>>> JIRA with a dedicated test.
>>> Yeah, I think I would take the very first JIRA ID.
>>>
>>> > 2. are we going to have a full scan of all existing tests and attach a
>>> JIRA ID to it?
>>> Yea, let's don't do this.
>>>
>>> > It's a nice-to-have, not super essential, just because ...
>>> It's been asked multiple times and each committer seems having a
>>> different understanding on this.
>>> It's not a biggie but wanted to make it clear and conclude this.
>>>
>>> > I'd add this only when a test specifically targets a certain issue.
>>> Yes, so this one I am not sure. From what I heard, people adds the JIRA
>>> in cases below:
>>>
>>> - Whenever the JIRA type is a bug
>>> - When a PR adds a couple of tests
>>> - Only when a test specifically targets a certain issue.
>>> - ...
>>>
>>> Which one do we prefer and simpler to follow?
>>>
>>> Or I can combine as below (im gonna reword when I actually document
>>> this):
>>> 1. In general, we should add a JIRA ID as prefix of a test when a PR
>>> targets to fix a specific issue.
>>> In practice, it usually happens when a JIRA type is a bug or a PR
>>> adds a couple of tests.
>>> 2. Uses "SPARK-: test name" format
>>>
>>> If we have no objection with ^, let me go with this.
>>>
>>> 2019년 11월 13일 (수) 오전 8:14, Sean Owen 님이 작성:
>>>
>>>> Let's suggest "SPARK-12345:" but not go back and change a bunch of test
>>>> cases.
>>>> I'd add this only when a test specifically targets a certain issue.
>>>> It's a nice-to-have, not super essential, just because in the rare
>>>> case you need to understand why a test asserts something, you can go
>>>> back and find what added it in the git history without much trouble.
>>>>
>>>> On Mon, Nov 11, 2019 at 10:46 AM Hyukjin Kwon 
>>>> wrote:
>>>> >
>>>> > Hi all,
>>>> >
>>>> > Maybe it's not a big deal but it brought some confusions time to time
>>>> into Spark dev and community. I think it's time to discuss about when/which
>>>> format to add a JI

Re: Adding JIRA ID as the prefix for the test case name

2019-11-14 Thread Shixiong(Ryan) Zhu
Should we also add a guideline for non Scala tests? Other languages (Java,
Python, R) don't support using string as a test name.

Best Regards,
Ryan


On Thu, Nov 14, 2019 at 4:04 AM Hyukjin Kwon  wrote:

> I opened a PR - https://github.com/apache/spark-website/pull/231
>
> 2019년 11월 13일 (수) 오전 10:43, Hyukjin Kwon 님이 작성:
>
>> > In general a test should be self descriptive and I don't think we
>> should be adding JIRA ticket references wholesale. Any action that the
>> reader has to take to understand why a test was introduced is one too many.
>> However in some cases the thing we are trying to test is very subtle and in
>> that case a reference to a JIRA ticket might be useful, I do still feel
>> that this should be a backstop and that properly documenting your tests is
>> a much better way of dealing with this.
>>
>> Yeah, the test should be self-descriptive. I don't think adding a JIRA
>> prefix harms this point. Probably I should add this sentence in the
>> guidelines as well.
>> Adding a JIRA prefix just adds one extra hint to track down details. I
>> think it's fine to stick to this practice and make it simpler and clear to
>> follow.
>>
>> > 1. what if multiple JIRA IDs relating to the same test? we just take
>> the very first JIRA ID?
>> Ideally one JIRA should describe one issue and one PR should fix one JIRA
>> with a dedicated test.
>> Yeah, I think I would take the very first JIRA ID.
>>
>> > 2. are we going to have a full scan of all existing tests and attach a
>> JIRA ID to it?
>> Yea, let's don't do this.
>>
>> > It's a nice-to-have, not super essential, just because ...
>> It's been asked multiple times and each committer seems having a
>> different understanding on this.
>> It's not a biggie but wanted to make it clear and conclude this.
>>
>> > I'd add this only when a test specifically targets a certain issue.
>> Yes, so this one I am not sure. From what I heard, people adds the JIRA
>> in cases below:
>>
>> - Whenever the JIRA type is a bug
>> - When a PR adds a couple of tests
>> - Only when a test specifically targets a certain issue.
>> - ...
>>
>> Which one do we prefer and simpler to follow?
>>
>> Or I can combine as below (im gonna reword when I actually document this):
>> 1. In general, we should add a JIRA ID as prefix of a test when a PR
>> targets to fix a specific issue.
>> In practice, it usually happens when a JIRA type is a bug or a PR
>> adds a couple of tests.
>> 2. Uses "SPARK-: test name" format
>>
>> If we have no objection with ^, let me go with this.
>>
>> 2019년 11월 13일 (수) 오전 8:14, Sean Owen 님이 작성:
>>
>>> Let's suggest "SPARK-12345:" but not go back and change a bunch of test
>>> cases.
>>> I'd add this only when a test specifically targets a certain issue.
>>> It's a nice-to-have, not super essential, just because in the rare
>>> case you need to understand why a test asserts something, you can go
>>> back and find what added it in the git history without much trouble.
>>>
>>> On Mon, Nov 11, 2019 at 10:46 AM Hyukjin Kwon 
>>> wrote:
>>> >
>>> > Hi all,
>>> >
>>> > Maybe it's not a big deal but it brought some confusions time to time
>>> into Spark dev and community. I think it's time to discuss about when/which
>>> format to add a JIRA ID as a prefix for the test case name in Scala test
>>> cases.
>>> >
>>> > Currently we have many test case names with prefixes as below:
>>> >
>>> > test("SPARK-XXXXX blah blah")
>>> > test("SPARK-X: blah blah")
>>> > test("SPARK-X - blah blah")
>>> > test("[SPARK-X] blah blah")
>>> > …
>>> >
>>> > It is a good practice to have the JIRA ID in general because, for
>>> instance,
>>> > it makes us put less efforts to track commit histories (or even when
>>> the files
>>> > are totally moved), or to track related information of tests failed.
>>> > Considering Spark's getting big, I think it's good to document.
>>> >
>>> > I would like to suggest this and document it in our guideline:
>>> >
>>> > 1. Add a prefix into a test name when a PR adds a couple of tests.
>>> > 2. Uses "SPARK-: test name" format which is used in our code base
>>> most
>>> >   often[1].
>>> >
>>> > We should make it simple and clear but closer to the actual practice.
>>> So, I would like to listen to what other people think. I would appreciate
>>> if you guys give some feedback about when to add the JIRA prefix. One
>>> alternative is that, we only add the prefix when the JIRA's type is bug.
>>> >
>>> > [1]
>>> > git grep -E 'test\("\SPARK-([0-9]+):' | wc -l
>>> >  923
>>> > git grep -E 'test\("\SPARK-([0-9]+) ' | wc -l
>>> >  477
>>> > git grep -E 'test\("\[SPARK-([0-9]+)\]' | wc -l
>>> >   16
>>> > git grep -E 'test\("\SPARK-([0-9]+) -' | wc -l
>>> >   13
>>> >
>>> >
>>> >
>>>
>>


Re: Adding JIRA ID as the prefix for the test case name

2019-11-14 Thread Hyukjin Kwon
I opened a PR - https://github.com/apache/spark-website/pull/231

2019년 11월 13일 (수) 오전 10:43, Hyukjin Kwon 님이 작성:

> > In general a test should be self descriptive and I don't think we should
> be adding JIRA ticket references wholesale. Any action that the reader has
> to take to understand why a test was introduced is one too many. However in
> some cases the thing we are trying to test is very subtle and in that case
> a reference to a JIRA ticket might be useful, I do still feel that this
> should be a backstop and that properly documenting your tests is a much
> better way of dealing with this.
>
> Yeah, the test should be self-descriptive. I don't think adding a JIRA
> prefix harms this point. Probably I should add this sentence in the
> guidelines as well.
> Adding a JIRA prefix just adds one extra hint to track down details. I
> think it's fine to stick to this practice and make it simpler and clear to
> follow.
>
> > 1. what if multiple JIRA IDs relating to the same test? we just take the
> very first JIRA ID?
> Ideally one JIRA should describe one issue and one PR should fix one JIRA
> with a dedicated test.
> Yeah, I think I would take the very first JIRA ID.
>
> > 2. are we going to have a full scan of all existing tests and attach a
> JIRA ID to it?
> Yea, let's don't do this.
>
> > It's a nice-to-have, not super essential, just because ...
> It's been asked multiple times and each committer seems having a different
> understanding on this.
> It's not a biggie but wanted to make it clear and conclude this.
>
> > I'd add this only when a test specifically targets a certain issue.
> Yes, so this one I am not sure. From what I heard, people adds the JIRA in
> cases below:
>
> - Whenever the JIRA type is a bug
> - When a PR adds a couple of tests
> - Only when a test specifically targets a certain issue.
> - ...
>
> Which one do we prefer and simpler to follow?
>
> Or I can combine as below (im gonna reword when I actually document this):
> 1. In general, we should add a JIRA ID as prefix of a test when a PR
> targets to fix a specific issue.
> In practice, it usually happens when a JIRA type is a bug or a PR adds
> a couple of tests.
> 2. Uses "SPARK-: test name" format
>
> If we have no objection with ^, let me go with this.
>
> 2019년 11월 13일 (수) 오전 8:14, Sean Owen 님이 작성:
>
>> Let's suggest "SPARK-12345:" but not go back and change a bunch of test
>> cases.
>> I'd add this only when a test specifically targets a certain issue.
>> It's a nice-to-have, not super essential, just because in the rare
>> case you need to understand why a test asserts something, you can go
>> back and find what added it in the git history without much trouble.
>>
>> On Mon, Nov 11, 2019 at 10:46 AM Hyukjin Kwon 
>> wrote:
>> >
>> > Hi all,
>> >
>> > Maybe it's not a big deal but it brought some confusions time to time
>> into Spark dev and community. I think it's time to discuss about when/which
>> format to add a JIRA ID as a prefix for the test case name in Scala test
>> cases.
>> >
>> > Currently we have many test case names with prefixes as below:
>> >
>> > test("SPARK-X blah blah")
>> > test("SPARK-X: blah blah")
>> > test("SPARK-X - blah blah")
>> > test("[SPARK-X] blah blah")
>> > …
>> >
>> > It is a good practice to have the JIRA ID in general because, for
>> instance,
>> > it makes us put less efforts to track commit histories (or even when
>> the files
>> > are totally moved), or to track related information of tests failed.
>> > Considering Spark's getting big, I think it's good to document.
>> >
>> > I would like to suggest this and document it in our guideline:
>> >
>> > 1. Add a prefix into a test name when a PR adds a couple of tests.
>> > 2. Uses "SPARK-: test name" format which is used in our code base
>> most
>> >   often[1].
>> >
>> > We should make it simple and clear but closer to the actual practice.
>> So, I would like to listen to what other people think. I would appreciate
>> if you guys give some feedback about when to add the JIRA prefix. One
>> alternative is that, we only add the prefix when the JIRA's type is bug.
>> >
>> > [1]
>> > git grep -E 'test\("\SPARK-([0-9]+):' | wc -l
>> >  923
>> > git grep -E 'test\("\SPARK-([0-9]+) ' | wc -l
>> >  477
>> > git grep -E 'test\("\[SPARK-([0-9]+)\]' | wc -l
>> >   16
>> > git grep -E 'test\("\SPARK-([0-9]+) -' | wc -l
>> >   13
>> >
>> >
>> >
>>
>


Re: Adding JIRA ID as the prefix for the test case name

2019-11-12 Thread Hyukjin Kwon
> In general a test should be self descriptive and I don't think we should
be adding JIRA ticket references wholesale. Any action that the reader has
to take to understand why a test was introduced is one too many. However in
some cases the thing we are trying to test is very subtle and in that case
a reference to a JIRA ticket might be useful, I do still feel that this
should be a backstop and that properly documenting your tests is a much
better way of dealing with this.

Yeah, the test should be self-descriptive. I don't think adding a JIRA
prefix harms this point. Probably I should add this sentence in the
guidelines as well.
Adding a JIRA prefix just adds one extra hint to track down details. I
think it's fine to stick to this practice and make it simpler and clear to
follow.

> 1. what if multiple JIRA IDs relating to the same test? we just take the
very first JIRA ID?
Ideally one JIRA should describe one issue and one PR should fix one JIRA
with a dedicated test.
Yeah, I think I would take the very first JIRA ID.

> 2. are we going to have a full scan of all existing tests and attach a
JIRA ID to it?
Yea, let's don't do this.

> It's a nice-to-have, not super essential, just because ...
It's been asked multiple times and each committer seems having a different
understanding on this.
It's not a biggie but wanted to make it clear and conclude this.

> I'd add this only when a test specifically targets a certain issue.
Yes, so this one I am not sure. From what I heard, people adds the JIRA in
cases below:

- Whenever the JIRA type is a bug
- When a PR adds a couple of tests
- Only when a test specifically targets a certain issue.
- ...

Which one do we prefer and simpler to follow?

Or I can combine as below (im gonna reword when I actually document this):
1. In general, we should add a JIRA ID as prefix of a test when a PR
targets to fix a specific issue.
In practice, it usually happens when a JIRA type is a bug or a PR adds
a couple of tests.
2. Uses "SPARK-: test name" format

If we have no objection with ^, let me go with this.

2019년 11월 13일 (수) 오전 8:14, Sean Owen 님이 작성:

> Let's suggest "SPARK-12345:" but not go back and change a bunch of test
> cases.
> I'd add this only when a test specifically targets a certain issue.
> It's a nice-to-have, not super essential, just because in the rare
> case you need to understand why a test asserts something, you can go
> back and find what added it in the git history without much trouble.
>
> On Mon, Nov 11, 2019 at 10:46 AM Hyukjin Kwon  wrote:
> >
> > Hi all,
> >
> > Maybe it's not a big deal but it brought some confusions time to time
> into Spark dev and community. I think it's time to discuss about when/which
> format to add a JIRA ID as a prefix for the test case name in Scala test
> cases.
> >
> > Currently we have many test case names with prefixes as below:
> >
> > test("SPARK-X blah blah")
> > test("SPARK-X: blah blah")
> > test("SPARK-X - blah blah")
> > test("[SPARK-X] blah blah")
> > …
> >
> > It is a good practice to have the JIRA ID in general because, for
> instance,
> > it makes us put less efforts to track commit histories (or even when the
> files
> > are totally moved), or to track related information of tests failed.
> > Considering Spark's getting big, I think it's good to document.
> >
> > I would like to suggest this and document it in our guideline:
> >
> > 1. Add a prefix into a test name when a PR adds a couple of tests.
> > 2. Uses "SPARK-: test name" format which is used in our code base
> most
> >   often[1].
> >
> > We should make it simple and clear but closer to the actual practice.
> So, I would like to listen to what other people think. I would appreciate
> if you guys give some feedback about when to add the JIRA prefix. One
> alternative is that, we only add the prefix when the JIRA's type is bug.
> >
> > [1]
> > git grep -E 'test\("\SPARK-([0-9]+):' | wc -l
> >  923
> > git grep -E 'test\("\SPARK-([0-9]+) ' | wc -l
> >  477
> > git grep -E 'test\("\[SPARK-([0-9]+)\]' | wc -l
> >   16
> > git grep -E 'test\("\SPARK-([0-9]+) -' | wc -l
> >   13
> >
> >
> >
>


Re: Adding JIRA ID as the prefix for the test case name

2019-11-12 Thread Sean Owen
Let's suggest "SPARK-12345:" but not go back and change a bunch of test cases.
I'd add this only when a test specifically targets a certain issue.
It's a nice-to-have, not super essential, just because in the rare
case you need to understand why a test asserts something, you can go
back and find what added it in the git history without much trouble.

On Mon, Nov 11, 2019 at 10:46 AM Hyukjin Kwon  wrote:
>
> Hi all,
>
> Maybe it's not a big deal but it brought some confusions time to time into 
> Spark dev and community. I think it's time to discuss about when/which format 
> to add a JIRA ID as a prefix for the test case name in Scala test cases.
>
> Currently we have many test case names with prefixes as below:
>
> test("SPARK-X blah blah")
> test("SPARK-X: blah blah")
> test("SPARK-X - blah blah")
> test("[SPARK-X] blah blah")
> …
>
> It is a good practice to have the JIRA ID in general because, for instance,
> it makes us put less efforts to track commit histories (or even when the files
> are totally moved), or to track related information of tests failed.
> Considering Spark's getting big, I think it's good to document.
>
> I would like to suggest this and document it in our guideline:
>
> 1. Add a prefix into a test name when a PR adds a couple of tests.
> 2. Uses "SPARK-: test name" format which is used in our code base most
>   often[1].
>
> We should make it simple and clear but closer to the actual practice. So, I 
> would like to listen to what other people think. I would appreciate if you 
> guys give some feedback about when to add the JIRA prefix. One alternative is 
> that, we only add the prefix when the JIRA's type is bug.
>
> [1]
> git grep -E 'test\("\SPARK-([0-9]+):' | wc -l
>  923
> git grep -E 'test\("\SPARK-([0-9]+) ' | wc -l
>  477
> git grep -E 'test\("\[SPARK-([0-9]+)\]' | wc -l
>   16
> git grep -E 'test\("\SPARK-([0-9]+) -' | wc -l
>   13
>
>
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Adding JIRA ID as the prefix for the test case name

2019-11-12 Thread Xin Ren
+1

Two confusions to clarify:
1. what if multiple JIRA IDs relating to the same test? we just take the
very first JIRA ID?
2. are we going to have a full scan of all existing tests and attach a JIRA
ID to it?

Thank you Hyukjin :)

On Tue, Nov 12, 2019 at 1:47 PM Dongjoon Hyun 
wrote:

> Thank you for the suggestion, Hyukjin.
>
> Previously, we added Jira IDs for the bug fix PR test cases as Gabor said.
>
> For the new features (and improvements), we didn't add them
>
> because all test cases in the newly added test suite share the same prefix
> JIRA ID in that case.
>
> It might looks redundant.
>
> However, I'm +1 for Hyukjin's original suggestion because we had better
> have the official rule for this in some ways.
>
> Thank you again, Hyukjin.
>
> Bests,
> Dongjoon.
>
>
>
> On Tue, Nov 12, 2019 at 1:13 AM Gabor Somogyi 
> wrote:
>
>> +1 for having that consistent rule in test names.
>> +1 for making it a guideline.
>> +1 defining exact guides in general.
>>
>> Until now I've followed the alternative (only add the prefix when the
>> JIRA's type is bug) and that way I knew that such tests contain edge cases.
>> In case of new features I'm pretty sure there is a reason to introduce it
>> but at the moment can't imagine a use-case where it can help us (want to
>> convert it to daily routine).
>>
>> > This is helpful when the test cases are moved to a different file.
>> The test can be found by name without jira ID
>>
>>
>> On Tue, Nov 12, 2019 at 5:31 AM Hyukjin Kwon  wrote:
>>
>>> In few days, I will wrote this in our guidelines probably after
>>> rewording it a bit better:
>>>
>>> 1. Add a prefix into a test name when a PR adds a couple of tests.
>>> 2. Uses "SPARK-: test name" format.
>>>
>>> Please let me know if you have any different opinion about what/when to
>>> write the JIRA ID as the prefix.
>>> I would like to make sure this simple rule is closer to the actual
>>> practice from you guys.
>>>
>>>
>>> 2019년 11월 12일 (화) 오전 8:41, Gengliang 님이 작성:
>>>
>>>> +1 for making it a guideline. This is helpful when the test cases are
>>>> moved to a different file.
>>>>
>>>> On Mon, Nov 11, 2019 at 3:23 PM Takeshi Yamamuro 
>>>> wrote:
>>>>
>>>>> +1 for having that consistent rule in test names.
>>>>> This is a trivial problem though, I think documenting this rule in the
>>>>> contribution guide
>>>>> might be able to make reviewer overhead a little smaller.
>>>>>
>>>>> Bests,
>>>>> Takeshi
>>>>>
>>>>> On Tue, Nov 12, 2019 at 1:46 AM Hyukjin Kwon 
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> Maybe it's not a big deal but it brought some confusions time to time
>>>>>> into Spark dev and community. I think it's time to discuss about 
>>>>>> when/which
>>>>>> format to add a JIRA ID as a prefix for the test case name in Scala test
>>>>>> cases.
>>>>>>
>>>>>> Currently we have many test case names with prefixes as below:
>>>>>>
>>>>>>- test("SPARK-X blah blah")
>>>>>>- test("SPARK-X: blah blah")
>>>>>>- test("SPARK-X - blah blah")
>>>>>>- test("[SPARK-X] blah blah")
>>>>>>- …
>>>>>>
>>>>>> It is a good practice to have the JIRA ID in general because, for
>>>>>> instance,
>>>>>> it makes us put less efforts to track commit histories (or even when
>>>>>> the files
>>>>>> are totally moved), or to track related information of tests failed.
>>>>>> Considering Spark's getting big, I think it's good to document.
>>>>>>
>>>>>> I would like to suggest this and document it in our guideline:
>>>>>>
>>>>>> 1. Add a prefix into a test name when a PR adds a couple of tests.
>>>>>> 2. Uses "SPARK-: test name" format which is used in our code base
>>>>>> most
>>>>>>   often[1].
>>>>>>
>>>>>> We should make it simple and clear but closer to the actual practice.
>>>>>> So, I would like to listen to what other people think. I would appreciate
>>>>>> if you guys give some feedback about when to add the JIRA prefix. One
>>>>>> alternative is that, we only add the prefix when the JIRA's type is bug.
>>>>>>
>>>>>> [1]
>>>>>> git grep -E 'test\("\SPARK-([0-9]+):' | wc -l
>>>>>>  923
>>>>>> git grep -E 'test\("\SPARK-([0-9]+) ' | wc -l
>>>>>>  477
>>>>>> git grep -E 'test\("\[SPARK-([0-9]+)\]' | wc -l
>>>>>>   16
>>>>>> git grep -E 'test\("\SPARK-([0-9]+) -' | wc -l
>>>>>>   13
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> ---
>>>>> Takeshi Yamamuro
>>>>>
>>>>


Re: Adding JIRA ID as the prefix for the test case name

2019-11-12 Thread Dongjoon Hyun
Thank you for the suggestion, Hyukjin.

Previously, we added Jira IDs for the bug fix PR test cases as Gabor said.

For the new features (and improvements), we didn't add them

because all test cases in the newly added test suite share the same prefix
JIRA ID in that case.

It might looks redundant.

However, I'm +1 for Hyukjin's original suggestion because we had better
have the official rule for this in some ways.

Thank you again, Hyukjin.

Bests,
Dongjoon.



On Tue, Nov 12, 2019 at 1:13 AM Gabor Somogyi 
wrote:

> +1 for having that consistent rule in test names.
> +1 for making it a guideline.
> +1 defining exact guides in general.
>
> Until now I've followed the alternative (only add the prefix when the
> JIRA's type is bug) and that way I knew that such tests contain edge cases.
> In case of new features I'm pretty sure there is a reason to introduce it
> but at the moment can't imagine a use-case where it can help us (want to
> convert it to daily routine).
>
> > This is helpful when the test cases are moved to a different file.
> The test can be found by name without jira ID
>
>
> On Tue, Nov 12, 2019 at 5:31 AM Hyukjin Kwon  wrote:
>
>> In few days, I will wrote this in our guidelines probably after rewording
>> it a bit better:
>>
>> 1. Add a prefix into a test name when a PR adds a couple of tests.
>> 2. Uses "SPARK-: test name" format.
>>
>> Please let me know if you have any different opinion about what/when to
>> write the JIRA ID as the prefix.
>> I would like to make sure this simple rule is closer to the actual
>> practice from you guys.
>>
>>
>> 2019년 11월 12일 (화) 오전 8:41, Gengliang 님이 작성:
>>
>>> +1 for making it a guideline. This is helpful when the test cases are
>>> moved to a different file.
>>>
>>> On Mon, Nov 11, 2019 at 3:23 PM Takeshi Yamamuro 
>>> wrote:
>>>
>>>> +1 for having that consistent rule in test names.
>>>> This is a trivial problem though, I think documenting this rule in the
>>>> contribution guide
>>>> might be able to make reviewer overhead a little smaller.
>>>>
>>>> Bests,
>>>> Takeshi
>>>>
>>>> On Tue, Nov 12, 2019 at 1:46 AM Hyukjin Kwon 
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> Maybe it's not a big deal but it brought some confusions time to time
>>>>> into Spark dev and community. I think it's time to discuss about 
>>>>> when/which
>>>>> format to add a JIRA ID as a prefix for the test case name in Scala test
>>>>> cases.
>>>>>
>>>>> Currently we have many test case names with prefixes as below:
>>>>>
>>>>>- test("SPARK-X blah blah")
>>>>>- test("SPARK-X: blah blah")
>>>>>- test("SPARK-X - blah blah")
>>>>>- test("[SPARK-X] blah blah")
>>>>>- …
>>>>>
>>>>> It is a good practice to have the JIRA ID in general because, for
>>>>> instance,
>>>>> it makes us put less efforts to track commit histories (or even when
>>>>> the files
>>>>> are totally moved), or to track related information of tests failed.
>>>>> Considering Spark's getting big, I think it's good to document.
>>>>>
>>>>> I would like to suggest this and document it in our guideline:
>>>>>
>>>>> 1. Add a prefix into a test name when a PR adds a couple of tests.
>>>>> 2. Uses "SPARK-: test name" format which is used in our code base
>>>>> most
>>>>>   often[1].
>>>>>
>>>>> We should make it simple and clear but closer to the actual practice.
>>>>> So, I would like to listen to what other people think. I would appreciate
>>>>> if you guys give some feedback about when to add the JIRA prefix. One
>>>>> alternative is that, we only add the prefix when the JIRA's type is bug.
>>>>>
>>>>> [1]
>>>>> git grep -E 'test\("\SPARK-([0-9]+):' | wc -l
>>>>>  923
>>>>> git grep -E 'test\("\SPARK-([0-9]+) ' | wc -l
>>>>>  477
>>>>> git grep -E 'test\("\[SPARK-([0-9]+)\]' | wc -l
>>>>>   16
>>>>> git grep -E 'test\("\SPARK-([0-9]+) -' | wc -l
>>>>>   13
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> ---
>>>> Takeshi Yamamuro
>>>>
>>>


Re: Adding JIRA ID as the prefix for the test case name

2019-11-12 Thread Gabor Somogyi
+1 for having that consistent rule in test names.
+1 for making it a guideline.
+1 defining exact guides in general.

Until now I've followed the alternative (only add the prefix when the
JIRA's type is bug) and that way I knew that such tests contain edge cases.
In case of new features I'm pretty sure there is a reason to introduce it
but at the moment can't imagine a use-case where it can help us (want to
convert it to daily routine).

> This is helpful when the test cases are moved to a different file.
The test can be found by name without jira ID


On Tue, Nov 12, 2019 at 5:31 AM Hyukjin Kwon  wrote:

> In few days, I will wrote this in our guidelines probably after rewording
> it a bit better:
>
> 1. Add a prefix into a test name when a PR adds a couple of tests.
> 2. Uses "SPARK-: test name" format.
>
> Please let me know if you have any different opinion about what/when to
> write the JIRA ID as the prefix.
> I would like to make sure this simple rule is closer to the actual
> practice from you guys.
>
>
> 2019년 11월 12일 (화) 오전 8:41, Gengliang 님이 작성:
>
>> +1 for making it a guideline. This is helpful when the test cases are
>> moved to a different file.
>>
>> On Mon, Nov 11, 2019 at 3:23 PM Takeshi Yamamuro 
>> wrote:
>>
>>> +1 for having that consistent rule in test names.
>>> This is a trivial problem though, I think documenting this rule in the
>>> contribution guide
>>> might be able to make reviewer overhead a little smaller.
>>>
>>> Bests,
>>> Takeshi
>>>
>>> On Tue, Nov 12, 2019 at 1:46 AM Hyukjin Kwon 
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> Maybe it's not a big deal but it brought some confusions time to time
>>>> into Spark dev and community. I think it's time to discuss about when/which
>>>> format to add a JIRA ID as a prefix for the test case name in Scala test
>>>> cases.
>>>>
>>>> Currently we have many test case names with prefixes as below:
>>>>
>>>>- test("SPARK-X blah blah")
>>>>- test("SPARK-X: blah blah")
>>>>- test("SPARK-X - blah blah")
>>>>- test("[SPARK-X] blah blah")
>>>>- …
>>>>
>>>> It is a good practice to have the JIRA ID in general because, for
>>>> instance,
>>>> it makes us put less efforts to track commit histories (or even when
>>>> the files
>>>> are totally moved), or to track related information of tests failed.
>>>> Considering Spark's getting big, I think it's good to document.
>>>>
>>>> I would like to suggest this and document it in our guideline:
>>>>
>>>> 1. Add a prefix into a test name when a PR adds a couple of tests.
>>>> 2. Uses "SPARK-: test name" format which is used in our code base
>>>> most
>>>>   often[1].
>>>>
>>>> We should make it simple and clear but closer to the actual practice.
>>>> So, I would like to listen to what other people think. I would appreciate
>>>> if you guys give some feedback about when to add the JIRA prefix. One
>>>> alternative is that, we only add the prefix when the JIRA's type is bug.
>>>>
>>>> [1]
>>>> git grep -E 'test\("\SPARK-([0-9]+):' | wc -l
>>>>  923
>>>> git grep -E 'test\("\SPARK-([0-9]+) ' | wc -l
>>>>  477
>>>> git grep -E 'test\("\[SPARK-([0-9]+)\]' | wc -l
>>>>   16
>>>> git grep -E 'test\("\SPARK-([0-9]+) -' | wc -l
>>>>   13
>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>>> ---
>>> Takeshi Yamamuro
>>>
>>


Re: Adding JIRA ID as the prefix for the test case name

2019-11-11 Thread Hyukjin Kwon
In few days, I will wrote this in our guidelines probably after rewording
it a bit better:

1. Add a prefix into a test name when a PR adds a couple of tests.
2. Uses "SPARK-XXXX: test name" format.

Please let me know if you have any different opinion about what/when to
write the JIRA ID as the prefix.
I would like to make sure this simple rule is closer to the actual practice
from you guys.


2019년 11월 12일 (화) 오전 8:41, Gengliang 님이 작성:

> +1 for making it a guideline. This is helpful when the test cases are
> moved to a different file.
>
> On Mon, Nov 11, 2019 at 3:23 PM Takeshi Yamamuro 
> wrote:
>
>> +1 for having that consistent rule in test names.
>> This is a trivial problem though, I think documenting this rule in the
>> contribution guide
>> might be able to make reviewer overhead a little smaller.
>>
>> Bests,
>> Takeshi
>>
>> On Tue, Nov 12, 2019 at 1:46 AM Hyukjin Kwon  wrote:
>>
>>> Hi all,
>>>
>>> Maybe it's not a big deal but it brought some confusions time to time
>>> into Spark dev and community. I think it's time to discuss about when/which
>>> format to add a JIRA ID as a prefix for the test case name in Scala test
>>> cases.
>>>
>>> Currently we have many test case names with prefixes as below:
>>>
>>>- test("SPARK-X blah blah")
>>>- test("SPARK-X: blah blah")
>>>- test("SPARK-X - blah blah")
>>>- test("[SPARK-X] blah blah")
>>>- …
>>>
>>> It is a good practice to have the JIRA ID in general because, for
>>> instance,
>>> it makes us put less efforts to track commit histories (or even when the
>>> files
>>> are totally moved), or to track related information of tests failed.
>>> Considering Spark's getting big, I think it's good to document.
>>>
>>> I would like to suggest this and document it in our guideline:
>>>
>>> 1. Add a prefix into a test name when a PR adds a couple of tests.
>>> 2. Uses "SPARK-: test name" format which is used in our code base
>>> most
>>>   often[1].
>>>
>>> We should make it simple and clear but closer to the actual practice.
>>> So, I would like to listen to what other people think. I would appreciate
>>> if you guys give some feedback about when to add the JIRA prefix. One
>>> alternative is that, we only add the prefix when the JIRA's type is bug.
>>>
>>> [1]
>>> git grep -E 'test\("\SPARK-([0-9]+):' | wc -l
>>>  923
>>> git grep -E 'test\("\SPARK-([0-9]+) ' | wc -l
>>>  477
>>> git grep -E 'test\("\[SPARK-([0-9]+)\]' | wc -l
>>>   16
>>> git grep -E 'test\("\SPARK-([0-9]+) -' | wc -l
>>>   13
>>>
>>>
>>>
>>>
>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>


Re: Adding JIRA ID as the prefix for the test case name

2019-11-11 Thread Gengliang
+1 for making it a guideline. This is helpful when the test cases are moved
to a different file.

On Mon, Nov 11, 2019 at 3:23 PM Takeshi Yamamuro 
wrote:

> +1 for having that consistent rule in test names.
> This is a trivial problem though, I think documenting this rule in the
> contribution guide
> might be able to make reviewer overhead a little smaller.
>
> Bests,
> Takeshi
>
> On Tue, Nov 12, 2019 at 1:46 AM Hyukjin Kwon  wrote:
>
>> Hi all,
>>
>> Maybe it's not a big deal but it brought some confusions time to time
>> into Spark dev and community. I think it's time to discuss about when/which
>> format to add a JIRA ID as a prefix for the test case name in Scala test
>> cases.
>>
>> Currently we have many test case names with prefixes as below:
>>
>>- test("SPARK-X blah blah")
>>- test("SPARK-X: blah blah")
>>- test("SPARK-X - blah blah")
>>- test("[SPARK-X] blah blah")
>>- …
>>
>> It is a good practice to have the JIRA ID in general because, for
>> instance,
>> it makes us put less efforts to track commit histories (or even when the
>> files
>> are totally moved), or to track related information of tests failed.
>> Considering Spark's getting big, I think it's good to document.
>>
>> I would like to suggest this and document it in our guideline:
>>
>> 1. Add a prefix into a test name when a PR adds a couple of tests.
>> 2. Uses "SPARK-: test name" format which is used in our code base most
>>   often[1].
>>
>> We should make it simple and clear but closer to the actual practice. So,
>> I would like to listen to what other people think. I would appreciate if
>> you guys give some feedback about when to add the JIRA prefix. One
>> alternative is that, we only add the prefix when the JIRA's type is bug.
>>
>> [1]
>> git grep -E 'test\("\SPARK-([0-9]+):' | wc -l
>>  923
>> git grep -E 'test\("\SPARK-([0-9]+) ' | wc -l
>>  477
>> git grep -E 'test\("\[SPARK-([0-9]+)\]' | wc -l
>>   16
>> git grep -E 'test\("\SPARK-([0-9]+) -' | wc -l
>>   13
>>
>>
>>
>>
>
> --
> ---
> Takeshi Yamamuro
>


Re: Adding JIRA ID as the prefix for the test case name

2019-11-11 Thread Takeshi Yamamuro
+1 for having that consistent rule in test names.
This is a trivial problem though, I think documenting this rule in the
contribution guide
might be able to make reviewer overhead a little smaller.

Bests,
Takeshi

On Tue, Nov 12, 2019 at 1:46 AM Hyukjin Kwon  wrote:

> Hi all,
>
> Maybe it's not a big deal but it brought some confusions time to time into
> Spark dev and community. I think it's time to discuss about when/which
> format to add a JIRA ID as a prefix for the test case name in Scala test
> cases.
>
> Currently we have many test case names with prefixes as below:
>
>    - test("SPARK-X blah blah")
>- test("SPARK-X: blah blah")
>- test("SPARK-X - blah blah")
>- test("[SPARK-X] blah blah")
>- …
>
> It is a good practice to have the JIRA ID in general because, for instance,
> it makes us put less efforts to track commit histories (or even when the
> files
> are totally moved), or to track related information of tests failed.
> Considering Spark's getting big, I think it's good to document.
>
> I would like to suggest this and document it in our guideline:
>
> 1. Add a prefix into a test name when a PR adds a couple of tests.
> 2. Uses "SPARK-: test name" format which is used in our code base most
>   often[1].
>
> We should make it simple and clear but closer to the actual practice. So,
> I would like to listen to what other people think. I would appreciate if
> you guys give some feedback about when to add the JIRA prefix. One
> alternative is that, we only add the prefix when the JIRA's type is bug.
>
> [1]
> git grep -E 'test\("\SPARK-([0-9]+):' | wc -l
>  923
> git grep -E 'test\("\SPARK-([0-9]+) ' | wc -l
>  477
> git grep -E 'test\("\[SPARK-([0-9]+)\]' | wc -l
>   16
> git grep -E 'test\("\SPARK-([0-9]+) -' | wc -l
>   13
>
>
>
>

-- 
---
Takeshi Yamamuro


Adding JIRA ID as the prefix for the test case name

2019-11-11 Thread Hyukjin Kwon
Hi all,

Maybe it's not a big deal but it brought some confusions time to time into
Spark dev and community. I think it's time to discuss about when/which
format to add a JIRA ID as a prefix for the test case name in Scala test
cases.

Currently we have many test case names with prefixes as below:

   - test("SPARK-X blah blah")
   - test("SPARK-XXXXX: blah blah")
   - test("SPARK-X - blah blah")
   - test("[SPARK-X] blah blah")
   - …

It is a good practice to have the JIRA ID in general because, for instance,
it makes us put less efforts to track commit histories (or even when the
files
are totally moved), or to track related information of tests failed.
Considering Spark's getting big, I think it's good to document.

I would like to suggest this and document it in our guideline:

1. Add a prefix into a test name when a PR adds a couple of tests.
2. Uses "SPARK-: test name" format which is used in our code base most
  often[1].

We should make it simple and clear but closer to the actual practice. So, I
would like to listen to what other people think. I would appreciate if you
guys give some feedback about when to add the JIRA prefix. One alternative
is that, we only add the prefix when the JIRA's type is bug.

[1]
git grep -E 'test\("\SPARK-([0-9]+):' | wc -l
 923
git grep -E 'test\("\SPARK-([0-9]+) ' | wc -l
 477
git grep -E 'test\("\[SPARK-([0-9]+)\]' | wc -l
  16
git grep -E 'test\("\SPARK-([0-9]+) -' | wc -l
  13


Re: Standardizing test build config

2019-08-28 Thread Shane Knapp
> I'm surfacing this to dev@ as the right answers may depend on a lot of
> historical decisions that I don't know about.
>
yeah, it's been time to clean up the build configs for quite a while...

also, the vast majority of these build configs predate even my joining
the amplab.  josh rosen and patrick wendell would have a LOT more
context about some of the decisions made.

> See https://issues.apache.org/jira/browse/SPARK-28900 for a summary of
> how the different build configs are set up, and why we might need to
> standardize them to fully test with JDK 11 at least, and why we could
> probably collapse some too.
>
> Comments welcome on the JIRA, as I'm sure I'm missing a thing or two.

i will definitely be adding my thoughts, but i most likely won't be
able to get to this until after the labor day holiday (i'm busy
writing performance reviews).

shane
-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Standardizing test build config

2019-08-28 Thread Sean Owen
I'm surfacing this to dev@ as the right answers may depend on a lot of
historical decisions that I don't know about.

See https://issues.apache.org/jira/browse/SPARK-28900 for a summary of
how the different build configs are set up, and why we might need to
standardize them to fully test with JDK 11 at least, and why we could
probably collapse some too.

Comments welcome on the JIRA, as I'm sure I'm missing a thing or two.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: displaying "Test build" in PR

2019-08-13 Thread Wenchen Fan
"Can one of the admins verify this patch?" is a corrected message, as
Jenkins won't test your PR until an admin approves it.

BTW I think "5 minutes" is a reasonable delay for PR testing. It usually
takes days to review and merge a PR, so I don't think seeing test progress
right after PR creation really matters.

On Tue, Aug 13, 2019 at 8:58 PM Younggyu Chun 
wrote:

> Thank you for your email.
>
> I think a newb like me might want to see what's going on PR and see
> something useful. For example, "Request builder polls every 5 minutes and
> you will see the progress here in a few minutes".  I guess we can add a
> more useful message on AmplabJenkins <https://github.com/AmplabJenkins>'
> message instead of a simple message like "Can one of the admins verify
> this patch?"
>
> Younggyu
>
> On Mon, Aug 12, 2019 at 3:55 PM Shane Knapp  wrote:
>
>> when you create a PR, the jenkins pull request builder job polls every ~5
>> or so minutes and will trigger jobs based on creation/approval to test/code
>> updates/etc.
>>
>> On Mon, Aug 12, 2019 at 11:25 AM Younggyu Chun 
>> wrote:
>>
>>> Hi All,
>>>
>>> I have a quick question about PR. Once I create a PR I'm not able to see
>>> if "Test build" is being processed. But I can see this after a few minutes
>>> or hours later. Is it possible to see if "Test Build" is being processed
>>> after PR is created right away?
>>>
>>> Thank you,
>>> Younggyu Chun
>>>
>>
>>
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>


Re: displaying "Test build" in PR

2019-08-13 Thread Younggyu Chun
Thank you for your email.

I think a newb like me might want to see what's going on PR and see
something useful. For example, "Request builder polls every 5 minutes and
you will see the progress here in a few minutes".  I guess we can add a
more useful message on AmplabJenkins <https://github.com/AmplabJenkins>'
message instead of a simple message like "Can one of the admins verify this
patch?"

Younggyu

On Mon, Aug 12, 2019 at 3:55 PM Shane Knapp  wrote:

> when you create a PR, the jenkins pull request builder job polls every ~5
> or so minutes and will trigger jobs based on creation/approval to test/code
> updates/etc.
>
> On Mon, Aug 12, 2019 at 11:25 AM Younggyu Chun 
> wrote:
>
>> Hi All,
>>
>> I have a quick question about PR. Once I create a PR I'm not able to see
>> if "Test build" is being processed. But I can see this after a few minutes
>> or hours later. Is it possible to see if "Test Build" is being processed
>> after PR is created right away?
>>
>> Thank you,
>> Younggyu Chun
>>
>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


Re: displaying "Test build" in PR

2019-08-12 Thread Shane Knapp
when you create a PR, the jenkins pull request builder job polls every ~5
or so minutes and will trigger jobs based on creation/approval to test/code
updates/etc.

On Mon, Aug 12, 2019 at 11:25 AM Younggyu Chun 
wrote:

> Hi All,
>
> I have a quick question about PR. Once I create a PR I'm not able to see
> if "Test build" is being processed. But I can see this after a few minutes
> or hours later. Is it possible to see if "Test Build" is being processed
> after PR is created right away?
>
> Thank you,
> Younggyu Chun
>


-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


displaying "Test build" in PR

2019-08-12 Thread Younggyu Chun
Hi All,

I have a quick question about PR. Once I create a PR I'm not able to see if
"Test build" is being processed. But I can see this after a few minutes or
hours later. Is it possible to see if "Test Build" is being processed after
PR is created right away?

Thank you,
Younggyu Chun


Re: sparkmaster-test-sbt-hadoop-2.7 failing RAT check

2019-06-24 Thread shane knapp
ah, ok.  thanks for letting me know.  :)

On Mon, Jun 24, 2019 at 9:39 AM Sean Owen  wrote:

> (We have two PRs to patch it up anyway already)
>
> On Mon, Jun 24, 2019 at 11:39 AM shane knapp  wrote:
> >
> > i'm aware and will be looking in to this later today.
> >
> > see:
> >
> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7/6043/console
> >
> > --
> > Shane Knapp
> > UC Berkeley EECS Research / RISELab Staff Technical Lead
> > https://rise.cs.berkeley.edu
>


-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: sparkmaster-test-sbt-hadoop-2.7 failing RAT check

2019-06-24 Thread Sean Owen
(We have two PRs to patch it up anyway already)

On Mon, Jun 24, 2019 at 11:39 AM shane knapp  wrote:
>
> i'm aware and will be looking in to this later today.
>
> see:
> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7/6043/console
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



sparkmaster-test-sbt-hadoop-2.7 failing RAT check

2019-06-24 Thread shane knapp
i'm aware and will be looking in to this later today.

see:
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7/6043/console

-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Raise Jenkins test timeout? with alternatives

2019-04-11 Thread Sean Owen
I have a big PR that keeps failing because it his the 300 minute build timeout:

https://github.com/apache/spark/pull/24314
https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4703/console

It's because it touches so much code that all tests run including
things like Kinesis. It looks like 300 mins isn't enough. We can raise
it to an eye-watering 360 minutes if that's just how long all tests
take.

I can also try splitting up the change to move out changes to a few
optional modules into separate PRs.

(Because this one makes it all the way through Python and Java tests
and almost all R tests several times, and doesn't touch Python or R
and shouldn't have any functional changes, I'm tempted to just merge
it, too, as a solution)

Thoughts?

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Run a specific PySpark test or group of tests

2018-12-06 Thread Xiao Li
Yes! This is very helpful!

On Wed, Dec 5, 2018 at 9:21 PM Wenchen Fan  wrote:

> great job! thanks a lot!
>
> On Thu, Dec 6, 2018 at 9:39 AM Hyukjin Kwon  wrote:
>
>> It's merged now and in developer tools page -
>> http://spark.apache.org/developer-tools.html#individual-tests
>> Have some func with PySpark testing!
>>
>> 2018년 12월 5일 (수) 오후 4:30, Hyukjin Kwon 님이 작성:
>>
>>> Hey all, I kind of met the goal with a minimised fix with keeping
>>> available framework and options. See
>>>
>>> https://github.com/apache/spark/pull/23203
>>> https://github.com/apache/spark-website/pull/161
>>>
>>> I know it's not perfect and other Python testing framework provide many
>>> good other features but should be good enough for now.
>>> Thanks!
>>>
>>>
>>> 2017년 8월 17일 (목) 오전 2:38, Nicholas Chammas 님이
>>> 작성:
>>>
>>>> Looks like it doesn’t take too much work to get pytest working on our
>>>> code base, since it knows how to run unittest tests.
>>>>
>>>> https://github.com/apache/spark/compare/master…nchammas:pytest
>>>> <https://github.com/apache/spark/compare/master...nchammas:pytest>
>>>>
>>>> For example I was able to do this from that branch and it did the right
>>>> thing, running only the tests with string in their name:
>>>>
>>>> python [pytest *]$ ../bin/spark-submit ./pytest-run-tests.py 
>>>> ./pyspark/sql/tests.py -v -k string
>>>>
>>>> However, looking more closely at the whole test setup, I’m hesitant to
>>>> work any further on this.
>>>>
>>>> My intention was to see if we could leverage pytest, tox, and other
>>>> test tools that are standard in the Python ecosystem to replace some of the
>>>> homegrown stuff we have. We have our own test dependency tracking code, our
>>>> own breakdown of tests into module-scoped chunks, and our own machinery to
>>>> parallelize test execution. It seems like it would be a lot of work to reap
>>>> the benefits of using the standard tools while ensuring that we don’t lose
>>>> any of the benefits our current test setup provides.
>>>>
>>>> Nick
>>>>
>>>> On Tue, Aug 15, 2017 at 3:26 PM Bryan Cutler cutl...@gmail.com
>>>> <http://mailto:cutl...@gmail.com> wrote:
>>>>
>>>> This generally works for me to just run tests within a class or even a
>>>>> single test.  Not as flexible as pytest -k, which would be nice..
>>>>>
>>>>> $ SPARK_TESTING=1 bin/pyspark pyspark.sql.tests ArrowTests
>>>>> On Tue, Aug 15, 2017 at 5:49 AM, Nicholas Chammas <
>>>>> nicholas.cham...@gmail.com> wrote:
>>>>>
>>>>>> Pytest does support unittest-based tests
>>>>>> <https://docs.pytest.org/en/latest/unittest.html>, allowing for
>>>>>> incremental adoption. I'll see how convenient it is to use with our 
>>>>>> current
>>>>>> test layout.
>>>>>>
>>>>>> On Tue, Aug 15, 2017 at 1:03 AM Hyukjin Kwon 
>>>>>> wrote:
>>>>>>
>>>>>>> For me, I would like this if this can be done with relatively small
>>>>>>> changes.
>>>>>>> How about adding more granular options, for example, specifying or
>>>>>>> filtering smaller set of test goals in the run-tests.py script?
>>>>>>> I think it'd be quite small change and we could roughly reach this
>>>>>>> goal if I understood correctly.
>>>>>>>
>>>>>>>
>>>>>>> 2017-08-15 3:06 GMT+09:00 Nicholas Chammas <
>>>>>>> nicholas.cham...@gmail.com>:
>>>>>>>
>>>>>>>> Say you’re working on something and you want to rerun the PySpark
>>>>>>>> tests, focusing on a specific test or group of tests. Is there a way 
>>>>>>>> to do
>>>>>>>> that?
>>>>>>>>
>>>>>>>> I know that you can test entire modules with this:
>>>>>>>>
>>>>>>>> ./python/run-tests --modules pyspark-sql
>>>>>>>>
>>>>>>>> But I’m looking for something more granular, like pytest’s -k
>>>>>>>> option.
>>>>>>>>
>>>>>>>> On that note, does anyone else think it would be valuable to use a
>>>>>>>> test runner like pytest to run our Python tests? The biggest benefits 
>>>>>>>> would
>>>>>>>> be the use of fixtures
>>>>>>>> <https://docs.pytest.org/en/latest/fixture.html>, and more
>>>>>>>> flexibility on test running and reporting. Just wondering if we’ve 
>>>>>>>> already
>>>>>>>> considered this.
>>>>>>>>
>>>>>>>> Nick
>>>>>>>> ​
>>>>>>>>
>>>>>>>
>>>>>>> ​
>>>>
>>>

-- 
[image: Spark+AI Summit North America 2019]
<http://t.sidekickopen24.com/s1t/c/5/f18dQhb0S7lM8dDMPbW2n0x6l2B9nMJN7t5X-FfhMynN2z8MDjQsyTKW56dzQQ1-_gV6102?t=https%3A%2F%2Fdatabricks.com%2Fsparkaisummit%2Fnorth-america&si=undefined&pi=406b8c9a-b648-4923-9ed1-9a51ffe213fa>


Re: Run a specific PySpark test or group of tests

2018-12-05 Thread Wenchen Fan
great job! thanks a lot!

On Thu, Dec 6, 2018 at 9:39 AM Hyukjin Kwon  wrote:

> It's merged now and in developer tools page -
> http://spark.apache.org/developer-tools.html#individual-tests
> Have some func with PySpark testing!
>
> 2018년 12월 5일 (수) 오후 4:30, Hyukjin Kwon 님이 작성:
>
>> Hey all, I kind of met the goal with a minimised fix with keeping
>> available framework and options. See
>>
>> https://github.com/apache/spark/pull/23203
>> https://github.com/apache/spark-website/pull/161
>>
>> I know it's not perfect and other Python testing framework provide many
>> good other features but should be good enough for now.
>> Thanks!
>>
>>
>> 2017년 8월 17일 (목) 오전 2:38, Nicholas Chammas 님이
>> 작성:
>>
>>> Looks like it doesn’t take too much work to get pytest working on our
>>> code base, since it knows how to run unittest tests.
>>>
>>> https://github.com/apache/spark/compare/master…nchammas:pytest
>>> <https://github.com/apache/spark/compare/master...nchammas:pytest>
>>>
>>> For example I was able to do this from that branch and it did the right
>>> thing, running only the tests with string in their name:
>>>
>>> python [pytest *]$ ../bin/spark-submit ./pytest-run-tests.py 
>>> ./pyspark/sql/tests.py -v -k string
>>>
>>> However, looking more closely at the whole test setup, I’m hesitant to
>>> work any further on this.
>>>
>>> My intention was to see if we could leverage pytest, tox, and other test
>>> tools that are standard in the Python ecosystem to replace some of the
>>> homegrown stuff we have. We have our own test dependency tracking code, our
>>> own breakdown of tests into module-scoped chunks, and our own machinery to
>>> parallelize test execution. It seems like it would be a lot of work to reap
>>> the benefits of using the standard tools while ensuring that we don’t lose
>>> any of the benefits our current test setup provides.
>>>
>>> Nick
>>>
>>> On Tue, Aug 15, 2017 at 3:26 PM Bryan Cutler cutl...@gmail.com
>>> <http://mailto:cutl...@gmail.com> wrote:
>>>
>>> This generally works for me to just run tests within a class or even a
>>>> single test.  Not as flexible as pytest -k, which would be nice..
>>>>
>>>> $ SPARK_TESTING=1 bin/pyspark pyspark.sql.tests ArrowTests
>>>> On Tue, Aug 15, 2017 at 5:49 AM, Nicholas Chammas <
>>>> nicholas.cham...@gmail.com> wrote:
>>>>
>>>>> Pytest does support unittest-based tests
>>>>> <https://docs.pytest.org/en/latest/unittest.html>, allowing for
>>>>> incremental adoption. I'll see how convenient it is to use with our 
>>>>> current
>>>>> test layout.
>>>>>
>>>>> On Tue, Aug 15, 2017 at 1:03 AM Hyukjin Kwon 
>>>>> wrote:
>>>>>
>>>>>> For me, I would like this if this can be done with relatively small
>>>>>> changes.
>>>>>> How about adding more granular options, for example, specifying or
>>>>>> filtering smaller set of test goals in the run-tests.py script?
>>>>>> I think it'd be quite small change and we could roughly reach this
>>>>>> goal if I understood correctly.
>>>>>>
>>>>>>
>>>>>> 2017-08-15 3:06 GMT+09:00 Nicholas Chammas <
>>>>>> nicholas.cham...@gmail.com>:
>>>>>>
>>>>>>> Say you’re working on something and you want to rerun the PySpark
>>>>>>> tests, focusing on a specific test or group of tests. Is there a way to 
>>>>>>> do
>>>>>>> that?
>>>>>>>
>>>>>>> I know that you can test entire modules with this:
>>>>>>>
>>>>>>> ./python/run-tests --modules pyspark-sql
>>>>>>>
>>>>>>> But I’m looking for something more granular, like pytest’s -k
>>>>>>> option.
>>>>>>>
>>>>>>> On that note, does anyone else think it would be valuable to use a
>>>>>>> test runner like pytest to run our Python tests? The biggest benefits 
>>>>>>> would
>>>>>>> be the use of fixtures
>>>>>>> <https://docs.pytest.org/en/latest/fixture.html>, and more
>>>>>>> flexibility on test running and reporting. Just wondering if we’ve 
>>>>>>> already
>>>>>>> considered this.
>>>>>>>
>>>>>>> Nick
>>>>>>> ​
>>>>>>>
>>>>>>
>>>>>> ​
>>>
>>


Re: Run a specific PySpark test or group of tests

2018-12-05 Thread Hyukjin Kwon
It's merged now and in developer tools page -
http://spark.apache.org/developer-tools.html#individual-tests
Have some func with PySpark testing!

2018년 12월 5일 (수) 오후 4:30, Hyukjin Kwon 님이 작성:

> Hey all, I kind of met the goal with a minimised fix with keeping
> available framework and options. See
>
> https://github.com/apache/spark/pull/23203
> https://github.com/apache/spark-website/pull/161
>
> I know it's not perfect and other Python testing framework provide many
> good other features but should be good enough for now.
> Thanks!
>
>
> 2017년 8월 17일 (목) 오전 2:38, Nicholas Chammas 님이
> 작성:
>
>> Looks like it doesn’t take too much work to get pytest working on our
>> code base, since it knows how to run unittest tests.
>>
>> https://github.com/apache/spark/compare/master…nchammas:pytest
>> <https://github.com/apache/spark/compare/master...nchammas:pytest>
>>
>> For example I was able to do this from that branch and it did the right
>> thing, running only the tests with string in their name:
>>
>> python [pytest *]$ ../bin/spark-submit ./pytest-run-tests.py 
>> ./pyspark/sql/tests.py -v -k string
>>
>> However, looking more closely at the whole test setup, I’m hesitant to
>> work any further on this.
>>
>> My intention was to see if we could leverage pytest, tox, and other test
>> tools that are standard in the Python ecosystem to replace some of the
>> homegrown stuff we have. We have our own test dependency tracking code, our
>> own breakdown of tests into module-scoped chunks, and our own machinery to
>> parallelize test execution. It seems like it would be a lot of work to reap
>> the benefits of using the standard tools while ensuring that we don’t lose
>> any of the benefits our current test setup provides.
>>
>> Nick
>>
>> On Tue, Aug 15, 2017 at 3:26 PM Bryan Cutler cutl...@gmail.com
>> <http://mailto:cutl...@gmail.com> wrote:
>>
>> This generally works for me to just run tests within a class or even a
>>> single test.  Not as flexible as pytest -k, which would be nice..
>>>
>>> $ SPARK_TESTING=1 bin/pyspark pyspark.sql.tests ArrowTests
>>> On Tue, Aug 15, 2017 at 5:49 AM, Nicholas Chammas <
>>> nicholas.cham...@gmail.com> wrote:
>>>
>>>> Pytest does support unittest-based tests
>>>> <https://docs.pytest.org/en/latest/unittest.html>, allowing for
>>>> incremental adoption. I'll see how convenient it is to use with our current
>>>> test layout.
>>>>
>>>> On Tue, Aug 15, 2017 at 1:03 AM Hyukjin Kwon 
>>>> wrote:
>>>>
>>>>> For me, I would like this if this can be done with relatively small
>>>>> changes.
>>>>> How about adding more granular options, for example, specifying or
>>>>> filtering smaller set of test goals in the run-tests.py script?
>>>>> I think it'd be quite small change and we could roughly reach this
>>>>> goal if I understood correctly.
>>>>>
>>>>>
>>>>> 2017-08-15 3:06 GMT+09:00 Nicholas Chammas >>>> >:
>>>>>
>>>>>> Say you’re working on something and you want to rerun the PySpark
>>>>>> tests, focusing on a specific test or group of tests. Is there a way to 
>>>>>> do
>>>>>> that?
>>>>>>
>>>>>> I know that you can test entire modules with this:
>>>>>>
>>>>>> ./python/run-tests --modules pyspark-sql
>>>>>>
>>>>>> But I’m looking for something more granular, like pytest’s -k option.
>>>>>>
>>>>>> On that note, does anyone else think it would be valuable to use a
>>>>>> test runner like pytest to run our Python tests? The biggest benefits 
>>>>>> would
>>>>>> be the use of fixtures
>>>>>> <https://docs.pytest.org/en/latest/fixture.html>, and more
>>>>>> flexibility on test running and reporting. Just wondering if we’ve 
>>>>>> already
>>>>>> considered this.
>>>>>>
>>>>>> Nick
>>>>>> ​
>>>>>>
>>>>>
>>>>> ​
>>
>


Re: Run a specific PySpark test or group of tests

2018-12-05 Thread Hyukjin Kwon
Hey all, I kind of met the goal with a minimised fix with keeping available
framework and options. See

https://github.com/apache/spark/pull/23203
https://github.com/apache/spark-website/pull/161

I know it's not perfect and other Python testing framework provide many
good other features but should be good enough for now.
Thanks!


2017년 8월 17일 (목) 오전 2:38, Nicholas Chammas 님이
작성:

> Looks like it doesn’t take too much work to get pytest working on our code
> base, since it knows how to run unittest tests.
>
> https://github.com/apache/spark/compare/master…nchammas:pytest
> <https://github.com/apache/spark/compare/master...nchammas:pytest>
>
> For example I was able to do this from that branch and it did the right
> thing, running only the tests with string in their name:
>
> python [pytest *]$ ../bin/spark-submit ./pytest-run-tests.py 
> ./pyspark/sql/tests.py -v -k string
>
> However, looking more closely at the whole test setup, I’m hesitant to
> work any further on this.
>
> My intention was to see if we could leverage pytest, tox, and other test
> tools that are standard in the Python ecosystem to replace some of the
> homegrown stuff we have. We have our own test dependency tracking code, our
> own breakdown of tests into module-scoped chunks, and our own machinery to
> parallelize test execution. It seems like it would be a lot of work to reap
> the benefits of using the standard tools while ensuring that we don’t lose
> any of the benefits our current test setup provides.
>
> Nick
>
> On Tue, Aug 15, 2017 at 3:26 PM Bryan Cutler cutl...@gmail.com
> <http://mailto:cutl...@gmail.com> wrote:
>
> This generally works for me to just run tests within a class or even a
>> single test.  Not as flexible as pytest -k, which would be nice..
>>
>> $ SPARK_TESTING=1 bin/pyspark pyspark.sql.tests ArrowTests
>> On Tue, Aug 15, 2017 at 5:49 AM, Nicholas Chammas <
>> nicholas.cham...@gmail.com> wrote:
>>
>>> Pytest does support unittest-based tests
>>> <https://docs.pytest.org/en/latest/unittest.html>, allowing for
>>> incremental adoption. I'll see how convenient it is to use with our current
>>> test layout.
>>>
>>> On Tue, Aug 15, 2017 at 1:03 AM Hyukjin Kwon 
>>> wrote:
>>>
>>>> For me, I would like this if this can be done with relatively small
>>>> changes.
>>>> How about adding more granular options, for example, specifying or
>>>> filtering smaller set of test goals in the run-tests.py script?
>>>> I think it'd be quite small change and we could roughly reach this goal
>>>> if I understood correctly.
>>>>
>>>>
>>>> 2017-08-15 3:06 GMT+09:00 Nicholas Chammas 
>>>> :
>>>>
>>>>> Say you’re working on something and you want to rerun the PySpark
>>>>> tests, focusing on a specific test or group of tests. Is there a way to do
>>>>> that?
>>>>>
>>>>> I know that you can test entire modules with this:
>>>>>
>>>>> ./python/run-tests --modules pyspark-sql
>>>>>
>>>>> But I’m looking for something more granular, like pytest’s -k option.
>>>>>
>>>>> On that note, does anyone else think it would be valuable to use a
>>>>> test runner like pytest to run our Python tests? The biggest benefits 
>>>>> would
>>>>> be the use of fixtures
>>>>> <https://docs.pytest.org/en/latest/fixture.html>, and more
>>>>> flexibility on test running and reporting. Just wondering if we’ve already
>>>>> considered this.
>>>>>
>>>>> Nick
>>>>> ​
>>>>>
>>>>
>>>> ​
>


  1   2   3   4   5   >