Introducing English SDK for Apache Spark - Seeking Your Feedback and Contributions

2023-07-03 Thread Gengliang Wang
Dear Apache Spark community,

We are delighted to announce the launch of a groundbreaking tool that aims
to make Apache Spark more user-friendly and accessible - the English SDK
<https://github.com/databrickslabs/pyspark-ai/>. Powered by the application
of Generative AI, the English SDK
<https://github.com/databrickslabs/pyspark-ai/> allows you to execute
complex tasks with simple English instructions. This exciting news was
announced
recently at the Data+AI Summit
<https://www.youtube.com/watch?v=yj7XlTB1Jvc=511s> and also introduced
through a detailed blog post
<https://www.databricks.com/blog/introducing-english-new-programming-language-apache-spark>
.

Now, we need your invaluable feedback and contributions. The aim of the
English SDK is not only to simplify and enrich your Apache Spark experience
but also to grow with the community. We're calling upon Spark developers
and users to explore this innovative tool, offer your insights, provide
feedback, and contribute to its evolution.

You can find more details about the SDK and usage examples on the GitHub
repository https://github.com/databrickslabs/pyspark-ai/. If you have any
feedback or suggestions, please feel free to open an issue directly on the
repository. We are actively monitoring the issues and value your insights.

We also welcome pull requests and are eager to see how you might extend or
refine this tool. Let's come together to continue making Apache Spark more
approachable and user-friendly.

Thank you in advance for your attention and involvement. We look forward to
hearing your thoughts and seeing your contributions!

Best,
Gengliang Wang


Re: Stickers and Swag

2022-06-14 Thread Gengliang Wang
FYI now you can find the shopping information on 
https://spark.apache.org/community  as well 
:)


Gengliang



> On Jun 14, 2022, at 7:47 PM, Hyukjin Kwon  wrote:
> 
> Woohoo
> 
> On Tue, 14 Jun 2022 at 15:04, Xiao Li  > wrote:
> Hi, all, 
> 
> The ASF has an official store at RedBubble 
>  that Apache Community 
> Development (ComDev) runs. If you are interested in buying Spark Swag, 70 
> products featuring the Spark logo are available: 
> https://www.redbubble.com/shop/ap/113203780 
>  
> 
> Go Spark! 
> 
> Xiao



Re: [ANNOUNCE] Apache Spark 3.2.1 released

2022-01-29 Thread Gengliang Wang
Thanks to Huaxin for driving the release!

Fengyu, this is a known issue that will be fixed in the 3.3 release.
Currently, the "hadoop3.2" means 3.2 or higher.  See the thread
https://lists.apache.org/thread/yov8xsggo3g2qr2p1rrr2xtps25wkbvj for
more details.


On Sat, Jan 29, 2022 at 3:26 PM FengYu Cao  wrote:

> https://spark.apache.org/downloads.html
>
> *2. Choose a package type:* menu shows that Pre-built for Hadoop 3.3
>
> but download link is *spark-3.2.1-bin-hadoop3.2.tgz*
>
> need an update?
>
> L. C. Hsieh  于2022年1月29日周六 14:26写道:
>
>> Thanks Huaxin for the 3.2.1 release!
>>
>> On Fri, Jan 28, 2022 at 10:14 PM Dongjoon Hyun 
>> wrote:
>> >
>> > Thank you again, Huaxin!
>> >
>> > Dongjoon.
>> >
>> > On Fri, Jan 28, 2022 at 6:23 PM DB Tsai  wrote:
>> >>
>> >> Thank you, Huaxin for the 3.2.1 release!
>> >>
>> >> Sent from my iPhone
>> >>
>> >> On Jan 28, 2022, at 5:45 PM, Chao Sun  wrote:
>> >>
>> >> 
>> >> Thanks Huaxin for driving the release!
>> >>
>> >> On Fri, Jan 28, 2022 at 5:37 PM Ruifeng Zheng 
>> wrote:
>> >>>
>> >>> It's Great!
>> >>> Congrats and thanks, huaxin!
>> >>>
>> >>>
>> >>> -- 原始邮件 --
>> >>> 发件人: "huaxin gao" ;
>> >>> 发送时间: 2022年1月29日(星期六) 上午9:07
>> >>> 收件人: "dev";"user";
>> >>> 主题: [ANNOUNCE] Apache Spark 3.2.1 released
>> >>>
>> >>> We are happy to announce the availability of Spark 3.2.1!
>> >>>
>> >>> Spark 3.2.1 is a maintenance release containing stability fixes. This
>> >>> release is based on the branch-3.2 maintenance branch of Spark. We
>> strongly
>> >>> recommend all 3.2 users to upgrade to this stable release.
>> >>>
>> >>> To download Spark 3.2.1, head over to the download page:
>> >>> https://spark.apache.org/downloads.html
>> >>>
>> >>> To view the release notes:
>> >>> https://spark.apache.org/releases/spark-release-3-2-1.html
>> >>>
>> >>> We would like to acknowledge all community members for contributing
>> to this
>> >>> release. This release would not have been possible without you.
>> >>>
>> >>> Huaxin Gao
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>
> --
> *camper42 (曹丰宇)*
> Douban, Inc.
>
> Mobile: +86 15691996359
> E-mail:  camper.x...@gmail.com
>


Re: [ANNOUNCE] Apache Spark 3.2.0

2021-10-19 Thread Gengliang Wang
Hi Prasad,

Thanks for reporting the issue. The link was wrong. It should be fixed now.
Could you try again on https://spark.apache.org/downloads.html?

On Tue, Oct 19, 2021 at 10:53 PM Prasad Paravatha <
prasad.parava...@gmail.com> wrote:

>
> https://www.apache.org/dyn/closer.lua/spark/spark-3.2.0/spark-3.2.0-bin-hadoop3.3.tgz
>
> FYI, unable to download from this location.
> Also, I don’t see Hadoop 3.3 version in the dist
>
>
> On Oct 19, 2021, at 9:39 AM, Bode, Meikel, NMA-CFD <
> meikel.b...@bertelsmann.de> wrote:
>
> 
>
> Many thanks! 
>
>
>
> *From:* Gengliang Wang 
> *Sent:* Dienstag, 19. Oktober 2021 16:16
> *To:* dev ; user 
> *Subject:* [ANNOUNCE] Apache Spark 3.2.0
>
>
>
> Hi all,
>
>
>
> Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous
> contribution from the open-source community, this release managed to
> resolve in excess of 1,700 Jira tickets.
>
>
>
> We'd like to thank our contributors and users for their contributions and
> early feedback to this release. This release would not have been possible
> without you.
>
>
>
> To download Spark 3.2.0, head over to the download page:
> https://spark.apache.org/downloads.html
> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdownloads.html=04%7C01%7CMeikel.Bode%40bertelsmann.de%7C07b03bbdbda54748d98908d9930b0665%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637702497848565836%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000=MDsRlP0K91uf4ZLVLOx%2BnMaOlT0gavRjMyDh49vMnuE%3D=0>
>
>
>
> To view the release notes:
> https://spark.apache.org/releases/spark-release-3-2-0.html
> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Freleases%2Fspark-release-3-2-0.html=04%7C01%7CMeikel.Bode%40bertelsmann.de%7C07b03bbdbda54748d98908d9930b0665%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637702497848565836%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000=p9vpQafuclgIT2TuGX2sDrL5A4d5%2BaS9aUHsbzXoE3o%3D=0>
>
>


[ANNOUNCE] Apache Spark 3.2.0

2021-10-19 Thread Gengliang Wang
Hi all,

Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous
contribution from the open-source community, this release managed to
resolve in excess of 1,700 Jira tickets.

We'd like to thank our contributors and users for their contributions and
early feedback to this release. This release would not have been possible
without you.

To download Spark 3.2.0, head over to the download page:
https://spark.apache.org/downloads.html

To view the release notes:
https://spark.apache.org/releases/spark-release-3-2-0.html


Re: spark 3.2 release date

2021-08-30 Thread Gengliang Wang
Hi,

There is not exact release date now. As per 
https://spark.apache.org/release-process.html 
<https://spark.apache.org/release-process.html>, we need a Release Candidate 
which passes the release vote.
Spark 3.2 RC1 failed recently. I will cut RC2 after 
https://issues.apache.org/jira/browse/SPARK-36619 
<https://issues.apache.org/jira/browse/SPARK-36619> is resolved.


Gengliang Wang




> On Aug 31, 2021, at 12:06 PM, infa elance  wrote:
> 
> What is the expected ballpark release date of spark 3.2 ? 
> 
> Thanks and Regards,
> Ajay.



Re: [ANNOUNCE] Announcing Apache Spark 3.0.1

2020-09-11 Thread Gengliang Wang
Congrats!
Thanks for the work, Ruifeng!


On Fri, Sep 11, 2020 at 9:51 PM Takeshi Yamamuro 
wrote:

> Congrats and thanks, Ruifeng!
>
>
> On Fri, Sep 11, 2020 at 9:50 PM Dongjoon Hyun 
> wrote:
>
>> It's great. Thank you, Ruifeng!
>>
>> Bests,
>> Dongjoon.
>>
>> On Fri, Sep 11, 2020 at 1:54 AM 郑瑞峰  wrote:
>>
>>> Hi all,
>>>
>>> We are happy to announce the availability of Spark 3.0.1!
>>> Spark 3.0.1 is a maintenance release containing stability fixes. This
>>> release is based on the branch-3.0 maintenance branch of Spark. We strongly
>>> recommend all 3.0 users to upgrade to this stable release.
>>>
>>> To download Spark 3.0.1, head over to the download page:
>>> http://spark.apache.org/downloads.html
>>>
>>> Note that you might need to clear your browser cache or to use
>>> `Private`/`Incognito` mode according to your browsers.
>>>
>>> To view the release notes:
>>> https://spark.apache.org/releases/spark-release-3-0-1.html
>>>
>>> We would like to acknowledge all community members for contributing to
>>> this release. This release would not have been possible without you.
>>>
>>>
>>> Thanks,
>>> Ruifeng Zheng
>>>
>>>
>
> --
> ---
> Takeshi Yamamuro
>


Re: Inner join with the table itself

2018-01-15 Thread Gengliang Wang
Hi Michael,

You can use `Explain` to see how your query is optimized. 
https://docs.databricks.com/spark/latest/spark-sql/language-manual/explain.html 

I believe your query is an actual cross join, which is usually very slow in 
execution.

To get rid of this, you can set `spark.sql.crossJoin.enabled` as true.


> 在 2018年1月15日,下午6:09,Jacek Laskowski  写道:
> 
> Hi Michael,
> 
> -dev +user
> 
> What's the query? How do you "fool spark"?
> 
> Pozdrawiam,
> Jacek Laskowski
> 
> https://about.me/JacekLaskowski 
> Mastering Spark SQL https://bit.ly/mastering-spark-sql 
> 
> Spark Structured Streaming https://bit.ly/spark-structured-streaming 
> 
> Mastering Kafka Streams https://bit.ly/mastering-kafka-streams 
> 
> Follow me at https://twitter.com/jaceklaskowski
>  
> On Mon, Jan 15, 2018 at 10:23 AM, Michael Shtelma  > wrote:
> Hi all,
> 
> If I try joining the table with itself using join columns, I am
> getting the following error:
> "Join condition is missing or trivial. Use the CROSS JOIN syntax to
> allow cartesian products between these relations.;"
> 
> This is not true, and my join is not trivial and is not a real cross
> join. I am providing join condition and expect to get maybe a couple
> of joined rows for each row in the original table.
> 
> There is a workaround for this, which implies renaming all the columns
> in source data frame and only afterwards proceed with the join. This
> allows us to fool spark.
> 
> Now I am wondering if there is a way to get rid of this problem in a
> better way? I do not like the idea of renaming the columns because
> this makes it really difficult to keep track of the names in the
> columns in result data frames.
> Is it possible to deactivate this check?
> 
> Thanks,
> Michael
> 
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org 
> 
> 
> 



Spark-avro 4.0.0 is released

2017-11-10 Thread Gengliang Wang
The 4.0.0 release adds support for Spark 2.2. The published artifact is
compatible with both Spark 2.1 and 2.2.

New Features:

   - Support for Spark 2.2 (#242
   ): resolve
   compatibility issue with datasource write API changes
   

   .

Bug fixes:

   - Fix name conflict in nested records (#249
   )


Release history:

   - https://github.com/databricks/spark-avro/releases


Thanks for the contributions from Imran Rashid, Gerard Solà and Jacky Shen!

-- 
Wang Gengliang
Software Engineer
Databricks Inc.