Re: Plans for built-in v2 data sources in Spark 4

2023-09-14 Thread Dongjoon Hyun
Hi, Will.

According to the following JIRA, as of now, there is no plan or on-going
discussion to switch it.

https://issues.apache.org/jira/browse/SPARK-44111 (Prepare Apache Spark
4.0.0)

Thanks,
Dongjoon.


On Wed, Sep 13, 2023 at 9:02 AM Will Raschkowski
 wrote:

> Hey everyone,
>
>
>
> I was wondering what the plans are for Spark's built-in v2 file data
> sources in Spark 4.
>
>
>
> Concretely, is the plan for Spark 4 to continue defaulting to the built-in
> v1 data sources? And if yes, what are the blockers for defaulting to v2? I
> see, just as example, that writing Hive-partitions is not supported in v2.
> Are there other blockers or outstanding discussions?
>
>
>
> Regards,
>
> Will
>
>
>


Re: Write Spark Connection client application in Go

2023-09-14 Thread bo yang
Thanks Holden and Martin for the nice words and feedback :)

On Wed, Sep 13, 2023 at 8:22 AM Martin Grund  wrote:

> This is absolutely awesome! Thank you so much for dedicating your time to
> this project!
>
>
> On Wed, Sep 13, 2023 at 6:04 AM Holden Karau  wrote:
>
>> That’s so cool! Great work y’all :)
>>
>> On Tue, Sep 12, 2023 at 8:14 PM bo yang  wrote:
>>
>>> Hi Spark Friends,
>>>
>>> Anyone interested in using Golang to write Spark application? We created
>>> a Spark Connect Go Client library
>>> . Would love to hear
>>> feedback/thoughts from the community.
>>>
>>> Please see the quick start guide
>>> 
>>> about how to use it. Following is a very short Spark Connect application in
>>> Go:
>>>
>>> func main() {
>>> spark, _ := 
>>> sql.SparkSession.Builder.Remote("sc://localhost:15002").Build()
>>> defer spark.Stop()
>>>
>>> df, _ := spark.Sql("select 'apple' as word, 123 as count union all 
>>> select 'orange' as word, 456 as count")
>>> df.Show(100, false)
>>> df.Collect()
>>>
>>> df.Write().Mode("overwrite").
>>> Format("parquet").
>>> Save("file:///tmp/spark-connect-write-example-output.parquet")
>>>
>>> df = spark.Read().Format("parquet").
>>> Load("file:///tmp/spark-connect-write-example-output.parquet")
>>> df.Show(100, false)
>>>
>>> df.CreateTempView("view1", true, false)
>>> df, _ = spark.Sql("select count, word from view1 order by count")
>>> }
>>>
>>>
>>> Many thanks to Martin, Hyukjin, Ruifeng and Denny for creating and
>>> working together on this repo! Welcome more people to contribute :)
>>>
>>> Best,
>>> Bo
>>>
>>>