Re: Bintray replacement for spark-packages.org

2021-04-29 Thread Dongjoon Hyun
I agree with Wenchen.

I can volunteer for Apache Spark 3.1.2 release manager at least.

Bests,
Dongjoon.


On Wed, Apr 28, 2021 at 10:15 AM Wenchen Fan  wrote:

> Shall we make new releases for 3.0 and 3.1? So that people don't need to
> change the sbt resolver/pom files to work around Bintray sunset. It's also
> been a while since the last 3.0 and 3.1 releases.
>
> On Tue, Apr 27, 2021 at 9:02 AM Matthew Powers <
> matthewkevinpow...@gmail.com> wrote:
>
>> Great job fixing this!!  I just checked and it's working on my end.  Updated
>> the resolver
>> 
>> and sbt test still works just fine.
>>
>> On Mon, Apr 26, 2021 at 3:31 AM Bo Zhang  wrote:
>>
>>> Hi Apache Spark devs,
>>>
>>> As you might know, Bintray, which is the repository service used for
>>> spark-packages.org, is in its sunset process. There was a planned
>>> brown-out on April 12th
>>>  and there will be
>>> another one on April 26th
>>> , and it will no
>>> longer be available from May 1st.
>>>
>>> We have spun up a new repository service at
>>> https://repos.spark-packages.org and it will be the new home for the
>>> artifacts on spark-packages.
>>>
>>> Given the planned Bintray brown-out, this is a good time for us to test
>>> the new repository service. To consume artifacts from that, please replace "
>>> dl.bintray.com/spark-packages/maven" with "repos.spark-packages.org" in
>>> the Maven pom files or sbt build files in your repositories, e.g.:
>>> https://github.com/apache/spark/pull/32346
>>>
>>> We are still working on the release process to the new repository
>>> service, and will provide an update here shortly.
>>>
>>> If you have any questions for using the new repository service, or any
>>> general questions for spark-packages, please reach out to
>>> feedb...@spark-packages.org.
>>>
>>> Thanks,
>>> Bo
>>>
>>


Should we add built in support for bouncy castle EC w/Kube

2021-04-29 Thread Holden Karau
Hi Folks,

I've deployed a new version of K3s locally and I ran into an issue
with the key format not being supported out of the box. We delegate to
fabric8 which has bouncy castle EC as an optional dependency. Adding
it would add ~6mb to the Kube jars. What do folks think?

Cheers,

Holden

P.S.

If you're running K3s in your lab as well and get "Exception in thread
"main" io.fabric8.kubernetes.client.KubernetesClientException:
JcaPEMKeyConverter is provided by BouncyCastle, an optional
dependency. To use support for EC Keys you must explicitly add this
dependency to classpath." I worked around it by adding
https://repo1.maven.org/maven2/org/bouncycastle/bcpkix-jdk15on/1.68/bcpkix-jdk15on-1.68.jar
& 
https://repo1.maven.org/maven2/org/bouncycastle/bcprov-jdk15on/1.68/bcprov-jdk15on-1.68.jar
to my class path.




-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
YouTube Live Streams: https://www.youtube.com/user/holdenkarau

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Should we add built in support for bouncy castle EC w/Kube

2021-04-29 Thread Sean Owen
I recall that Bouncy Castle has some crypto export implications. If it's in
the distro then I think we'd have to update
https://www.apache.org/licenses/exports/ to reflect that Bouncy Castle is
again included in the product. But that's doable. Just have to recall how
one updates that.

On Thu, Apr 29, 2021 at 1:08 PM Holden Karau  wrote:

> Hi Folks,
>
> I've deployed a new version of K3s locally and I ran into an issue
> with the key format not being supported out of the box. We delegate to
> fabric8 which has bouncy castle EC as an optional dependency. Adding
> it would add ~6mb to the Kube jars. What do folks think?
>
> Cheers,
>
> Holden
>
> P.S.
>
> If you're running K3s in your lab as well and get "Exception in thread
> "main" io.fabric8.kubernetes.client.KubernetesClientException:
> JcaPEMKeyConverter is provided by BouncyCastle, an optional
> dependency. To use support for EC Keys you must explicitly add this
> dependency to classpath." I worked around it by adding
>
> https://repo1.maven.org/maven2/org/bouncycastle/bcpkix-jdk15on/1.68/bcpkix-jdk15on-1.68.jar
> &
> 
> https://repo1.maven.org/maven2/org/bouncycastle/bcprov-jdk15on/1.68/bcprov-jdk15on-1.68.jar
> to my class path.
>
>
>
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [DISCUSS] Multiple columns adding/replacing support in PySpark DataFrame API

2021-04-29 Thread Saurabh Chawla
Hi All,

I also had a scenario where at runtime, I needed to loop through a
dataframe to use withColumn many times.

 For the safer side I used the reflection to access the withColumns to
prevent any java.lang.StackOverflowError.

val dataSetClass = Class.forName("org.apache.spark.sql.Dataset")
val newConfigurationMethod =
  dataSetClass.getMethod("withColumns", classOf[Seq[String]],
classOf[Seq[Column]])
newConfigurationMethod.invoke(
  baseDataFrame, columnName, columnValue).asInstanceOf[DataFrame]

It would be great if we use the "withColumns" rather than using the
reflection code like this.
or
make changes in the code to merge the project with existing project in the
plan, instead of adding the new project every time we call the "withColumn".

+1 for exposing the *withColumns*

Regards
Saurabh Chawla

On Thu, Apr 22, 2021 at 1:03 PM Yikun Jiang  wrote:

> Hi, all
>
> *Background:*
>
> Currently, there is a withColumns
> [1]
> method to help users/devs add/replace multiple columns at once.
> But this method is private and isn't exposed as a public API interface,
> that means it cannot be used by the user directly, and also it is not
> supported in PySpark API.
>
> As the dataframe user, I can only call withColumn() multiple times:
>
> df.withColumn("key1", col("key1")).withColumn("key2", 
> col("key2")).withColumn("key3", col("key3"))
>
> rather than:
>
> df.withColumn(["key1", "key2", "key3"], [col("key1"), col("key2"), 
> col("key3")])
>
> Multiple calls bring some higher cost on developer experience and
> performance. Especially in a PySpark related scenario, multiple calls mean
> multiple py4j calls.
>
> As mentioned
>  from
> @Hyukjin, there were some previous discussions on  SPARK-12225
>  [2] .
>
> [1]
> https://github.com/apache/spark/blob/b5241c97b17a1139a4ff719bfce7f68aef094d95/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L2402
> [2] https://issues.apache.org/jira/browse/SPARK-12225
>
> *Potential solution:*
> Looks like there are 2 potential solutions if we want to support it:
>
> 1. Introduce a *withColumns *api for Scala/Python.
> A separate public withColumns API will be added in scala/python api.
>
> 2. Make withColumn can receive *single col *and also the* list of cols*.
> I did some experimental try on PySpark on
> https://github.com/apache/spark/pull/32276
> Just like Maciej said
> 
> it will bring some confusion with naming.
>
>
> Thanks for your reading, feel free to reply if you have any other concerns
> or suggestions!
>
>
> Regards,
> Yikun
>