Re: [build system] jenkins down, working on it

2021-05-04 Thread shane knapp ☠
we're back and building!

On Tue, May 4, 2021 at 4:03 PM shane knapp ☠  wrote:

> jenkins went down some time in the past few days, and i'm currently
> investigating.
>
> if it's been down a while, i apologize as i've been dealing w/some health
> issues.
>
> shane
> --
> Shane Knapp
> Computer Guy / Voice of Reason
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


-- 
Shane Knapp
Computer Guy / Voice of Reason
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


[build system] jenkins down, working on it

2021-05-04 Thread shane knapp ☠
jenkins went down some time in the past few days, and i'm currently
investigating.

if it's been down a while, i apologize as i've been dealing w/some health
issues.

shane
-- 
Shane Knapp
Computer Guy / Voice of Reason
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


RE: [VOTE] Release Spark 2.4.8 (RC3)

2021-05-04 Thread Liang-Chi Hsieh
Hi,

Yes, RC3 fails due to an ancient bug found during this RC.
I will cut RC4 soon.

Thank you.


Nicholas Marion wrote
> Hi,
> 
> Was it decided to fail RC3 in favor of RC4?
> 
> 
>   
>  
>  Regards, 
>  
>   
>  
>  NICHOLAS T. MARION   
>  
>  AI and Analytics Development Lead | IzODA CPO
>  
>   
>  
>   
>   
>   
>   
>   
>   
>  Phone: 1-845-433-5010 | Tie-Line: 293-5010   
>  
> IBM 
>  E-mail: 

> nmarion@.ibm

>  
>  Find me on: LinkedIn:2455
> South Rd 
>  http://www.linkedin.com/in/nicholasmarion Poughkeepie, New York
> 12601-5400 
>   
>   
>  
> United States 
>   
>   
>   
>   
>   
>   
> 
> 
> 
> 
> 
> 
> 
> 
> From: Liang-Chi Hsieh <

> viirya@

> >
> To:   

> dev@.apache

> Date: 04/30/2021 03:12 PM
> Subject:  [EXTERNAL] Re: [VOTE] Release Spark 2.4.8 (RC3)
> 
> 
> 
> Hi all,
> 
> Thanks for actively voting. Unfortunately, we found a very ancient bug
> (SPARK-35278), and the fix (
> https://github.com/apache/spark/pull/32404
>  ) is
> going to be merged soon. We may fail this RC3.
> 
> I will go to cut RC4 as soon as the fix is merged.
> 
> Thank you!
> 
> 
> 
> --
> Sent from:
> http://apache-spark-developers-list.1001551.n3.nabble.com/
> 
> 
> -
> To unsubscribe e-mail: 

> dev-unsubscribe@.apache

> 
> 
> 
> 
> 
> 1C223424.jpg (714 bytes)
> ;
> 1C035910.gif (2K)
> ;
> ecblank.gif (64 bytes)
> ;
> graycol.gif (146 bytes)
> ;





--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



RE: [VOTE] Release Spark 2.4.8 (RC3)

2021-05-04 Thread Nicholas Marion

Hi,

Was it decided to fail RC3 in favor of RC4?



   
 Regards,   
   

   
 NICHOLAS T. MARION 
   
 AI and Analytics Development Lead | IzODA CPO  
   

   






 Phone: 1-845-433-5010 | Tie-Line: 293-5010 
IBM 
 E-mail: nmar...@us.ibm.com 

 Find me on: LinkedIn:2455 
South Rd 
 http://www.linkedin.com/in/nicholasmarion Poughkeepie, New York 
12601-5400 


  United 
States 














From:   Liang-Chi Hsieh 
To: dev@spark.apache.org
Date:   04/30/2021 03:12 PM
Subject:[EXTERNAL] Re: [VOTE] Release Spark 2.4.8 (RC3)



Hi all,

Thanks for actively voting. Unfortunately, we found a very ancient bug
(SPARK-35278), and the fix (
https://github.com/apache/spark/pull/32404
 ) is
going to be merged soon. We may fail this RC3.

I will go to cut RC4 as soon as the fix is merged.

Thank you!



--
Sent from:
http://apache-spark-developers-list.1001551.n3.nabble.com/


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org





Re: [DISCUSS] Multiple columns adding/replacing support in PySpark DataFrame API

2021-05-04 Thread Паша
I've created my own implicit withColumnsRenamed for such a purpose which
just accepted a map of string→string and called rename multiple times.

вт, 4 мая 2021 г. в 10:22, Yikun Jiang :

> @Saurabh @Mr.Powers Thanks for the input information.
>
> I personal perfer to introduce the `withColumns` because it bring more
> friendly development experience rather than select( * ).
>
> This is the PR to add `withColumns`:
> https://github.com/apache/spark/pull/32431
>
> Regards,
> Yikun
>
>
> Saurabh Chawla  于2021年4月30日周五 下午1:13写道:
>
>> Hi All,
>>
>> I also had a scenario where at runtime, I needed to loop through a
>> dataframe to use withColumn many times.
>>
>>  For the safer side I used the reflection to access the withColumns to
>> prevent any java.lang.StackOverflowError.
>>
>> val dataSetClass = Class.forName("org.apache.spark.sql.Dataset")
>> val newConfigurationMethod =
>>   dataSetClass.getMethod("withColumns", classOf[Seq[String]], 
>> classOf[Seq[Column]])
>> newConfigurationMethod.invoke(
>>   baseDataFrame, columnName, columnValue).asInstanceOf[DataFrame]
>>
>> It would be great if we use the "withColumns" rather than using the
>> reflection code like this.
>> or
>> make changes in the code to merge the project with existing project in
>> the plan, instead of adding the new project every time we call the "
>> withColumn".
>>
>> +1 for exposing the *withColumns*
>>
>> Regards
>> Saurabh Chawla
>>
>> On Thu, Apr 22, 2021 at 1:03 PM Yikun Jiang  wrote:
>>
>>> Hi, all
>>>
>>> *Background:*
>>>
>>> Currently, there is a withColumns
>>> [1]
>>> method to help users/devs add/replace multiple columns at once.
>>> But this method is private and isn't exposed as a public API interface,
>>> that means it cannot be used by the user directly, and also it is not
>>> supported in PySpark API.
>>>
>>> As the dataframe user, I can only call withColumn() multiple times:
>>>
>>> df.withColumn("key1", col("key1")).withColumn("key2", 
>>> col("key2")).withColumn("key3", col("key3"))
>>>
>>> rather than:
>>>
>>> df.withColumn(["key1", "key2", "key3"], [col("key1"), col("key2"), 
>>> col("key3")])
>>>
>>> Multiple calls bring some higher cost on developer experience and
>>> performance. Especially in a PySpark related scenario, multiple calls mean
>>> multiple py4j calls.
>>>
>>> As mentioned
>>> 
>>> from @Hyukjin, there were some previous discussions on  SPARK-12225
>>>  [2] .
>>>
>>> [1]
>>> https://github.com/apache/spark/blob/b5241c97b17a1139a4ff719bfce7f68aef094d95/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L2402
>>> [2] https://issues.apache.org/jira/browse/SPARK-12225
>>>
>>> *Potential solution:*
>>> Looks like there are 2 potential solutions if we want to support it:
>>>
>>> 1. Introduce a *withColumns *api for Scala/Python.
>>> A separate public withColumns API will be added in scala/python api.
>>>
>>> 2. Make withColumn can receive *single col *and also the* list of cols*.
>>> I did some experimental try on PySpark on
>>> https://github.com/apache/spark/pull/32276
>>> Just like Maciej said
>>> 
>>> it will bring some confusion with naming.
>>>
>>>
>>> Thanks for your reading, feel free to reply if you have any other
>>> concerns or suggestions!
>>>
>>>
>>> Regards,
>>> Yikun
>>>
>>


Re: [apache/spark-website] Update contributing to include code of conduct section (#335)

2021-05-04 Thread Sean Owen
Just FYI - proposed update to the CoC for the project. Looks reasonable to
simply adopt the ASF code of conduct, per the PR.

On Tue, May 4, 2021 at 2:02 AM Jungtaek Lim 
wrote:

> I think the rationalization is great, but why not going through dev@
> mailing list? Many contributors are subscribing dev@ mailing list as well
> and it would be also a good time to remind the CoC from your
> idea/discussion thread.
>
> I assume getting consensus to add here is just a matter of time (as CoC is
> already something ASF project requires to us, and we just make it explicit
> here), but might be ideal to reach more audiences.
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> ,
> or unsubscribe
> 
> .
>


Re: [DISCUSS] Multiple columns adding/replacing support in PySpark DataFrame API

2021-05-04 Thread Yikun Jiang
@Saurabh @Mr.Powers Thanks for the input information.

I personal perfer to introduce the `withColumns` because it bring more
friendly development experience rather than select( * ).

This is the PR to add `withColumns`:
https://github.com/apache/spark/pull/32431

Regards,
Yikun


Saurabh Chawla  于2021年4月30日周五 下午1:13写道:

> Hi All,
>
> I also had a scenario where at runtime, I needed to loop through a
> dataframe to use withColumn many times.
>
>  For the safer side I used the reflection to access the withColumns to
> prevent any java.lang.StackOverflowError.
>
> val dataSetClass = Class.forName("org.apache.spark.sql.Dataset")
> val newConfigurationMethod =
>   dataSetClass.getMethod("withColumns", classOf[Seq[String]], 
> classOf[Seq[Column]])
> newConfigurationMethod.invoke(
>   baseDataFrame, columnName, columnValue).asInstanceOf[DataFrame]
>
> It would be great if we use the "withColumns" rather than using the
> reflection code like this.
> or
> make changes in the code to merge the project with existing project in the
> plan, instead of adding the new project every time we call the "
> withColumn".
>
> +1 for exposing the *withColumns*
>
> Regards
> Saurabh Chawla
>
> On Thu, Apr 22, 2021 at 1:03 PM Yikun Jiang  wrote:
>
>> Hi, all
>>
>> *Background:*
>>
>> Currently, there is a withColumns
>> [1]
>> method to help users/devs add/replace multiple columns at once.
>> But this method is private and isn't exposed as a public API interface,
>> that means it cannot be used by the user directly, and also it is not
>> supported in PySpark API.
>>
>> As the dataframe user, I can only call withColumn() multiple times:
>>
>> df.withColumn("key1", col("key1")).withColumn("key2", 
>> col("key2")).withColumn("key3", col("key3"))
>>
>> rather than:
>>
>> df.withColumn(["key1", "key2", "key3"], [col("key1"), col("key2"), 
>> col("key3")])
>>
>> Multiple calls bring some higher cost on developer experience and
>> performance. Especially in a PySpark related scenario, multiple calls mean
>> multiple py4j calls.
>>
>> As mentioned
>>  from
>> @Hyukjin, there were some previous discussions on  SPARK-12225
>>  [2] .
>>
>> [1]
>> https://github.com/apache/spark/blob/b5241c97b17a1139a4ff719bfce7f68aef094d95/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L2402
>> [2] https://issues.apache.org/jira/browse/SPARK-12225
>>
>> *Potential solution:*
>> Looks like there are 2 potential solutions if we want to support it:
>>
>> 1. Introduce a *withColumns *api for Scala/Python.
>> A separate public withColumns API will be added in scala/python api.
>>
>> 2. Make withColumn can receive *single col *and also the* list of cols*.
>> I did some experimental try on PySpark on
>> https://github.com/apache/spark/pull/32276
>> Just like Maciej said
>> 
>> it will bring some confusion with naming.
>>
>>
>> Thanks for your reading, feel free to reply if you have any other
>> concerns or suggestions!
>>
>>
>> Regards,
>> Yikun
>>
>