Re: DataFrame.mapInArrow

2021-11-10 Thread Hyukjin Kwon
Sure, thanks Holden :-).

On Thu, 11 Nov 2021 at 15:53, Holden Karau  wrote:

> Sorry I've been busy, I'll try and take a look tomorrow, excited to see
> this progress though :)
>
> On Wed, Nov 10, 2021 at 9:01 PM Hyukjin Kwon  wrote:
>
>> Last reminder: I plan to merge this in a few more days. Any feedback and
>> review would be very appreciated.
>>
>> On Tue, 9 Nov 2021 at 21:51, Hyukjin Kwon  wrote:
>>
>>> Hi dev,
>>>
>>> I proposed DataFrame.mapInArrow (
>>> https://github.com/apache/spark/pull/34505) which allows users to
>>> directly leverage Arrow batch to plug in other external systems easily.
>>>
>>> I would like to make sure this design of API covers most use cases, and
>>> would like to know if there is other feedback or opinion on this.
>>>
>>> I would appreciate any feedback on this.
>>>
>>> Thanks.
>>>
>>
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


Re: DataFrame.mapInArrow

2021-11-10 Thread Holden Karau
Sorry I've been busy, I'll try and take a look tomorrow, excited to see
this progress though :)

On Wed, Nov 10, 2021 at 9:01 PM Hyukjin Kwon  wrote:

> Last reminder: I plan to merge this in a few more days. Any feedback and
> review would be very appreciated.
>
> On Tue, 9 Nov 2021 at 21:51, Hyukjin Kwon  wrote:
>
>> Hi dev,
>>
>> I proposed DataFrame.mapInArrow (
>> https://github.com/apache/spark/pull/34505) which allows users to
>> directly leverage Arrow batch to plug in other external systems easily.
>>
>> I would like to make sure this design of API covers most use cases, and
>> would like to know if there is other feedback or opinion on this.
>>
>> I would appreciate any feedback on this.
>>
>> Thanks.
>>
>

-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


Re: DataFrame.mapInArrow

2021-11-10 Thread Hyukjin Kwon
Last reminder: I plan to merge this in a few more days. Any feedback and
review would be very appreciated.

On Tue, 9 Nov 2021 at 21:51, Hyukjin Kwon  wrote:

> Hi dev,
>
> I proposed DataFrame.mapInArrow (
> https://github.com/apache/spark/pull/34505) which allows users to
> directly leverage Arrow batch to plug in other external systems easily.
>
> I would like to make sure this design of API covers most use cases, and
> would like to know if there is other feedback or opinion on this.
>
> I would appreciate any feedback on this.
>
> Thanks.
>


Re: ASF board report draft for November

2021-11-10 Thread Matei Zaharia
Sounds good, I’ll fix that.

Matei

> On Nov 9, 2021, at 12:39 AM, Mich Talebzadeh  
> wrote:
> 
> Hi,
> 
> Just a minor modification
> 
> Under Description:
> 
> Apache Spark is a fast and general engine for large-scale data processing.
> 
> It should read
> 
> Apache Spark is a fast and general purpose engine for large-scale data 
> processing. 
> 
> HTH
> 
>view my Linkedin profile 
> 
>  
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
> damage or destruction of data or any other property which may arise from 
> relying on this email's technical content is explicitly disclaimed. The 
> author will in no case be liable for any monetary damages arising from such 
> loss, damage or destruction.
>  
> 
> 
> On Tue, 9 Nov 2021 at 08:06, Matei Zaharia  > wrote:
> Hi all,
> 
> Our ASF board report needs to be submitted again this Wednesday (November 
> 10). I wrote a draft with the major things that happened in the past three 
> months — let me know if I missed something.
> 
> ===
> 
> Description:
> 
> Apache Spark is a fast and general engine for large-scale data processing. It
> offers high-level APIs in Java, Scala, Python, R and SQL as well as a rich set
> of libraries including stream processing, machine learning, and graph
> analytics.
> 
> Issues for the board:
> 
> - None
> 
> Project status:
> 
> - We recently released Apache Spark 3.2, a feature release that adds several 
> large
>   pieces of functionality. Spark 3.2 includes a new Pandas API for Apache 
> Spark
>   based on the Koalas project, a new push-based shuffle implementation, a more
>   efficient RocksDB state store for Structured Streaming, native support for
>   session windows, error message standardization, and significant improvements
>   to Spark SQL, such as the use of adaptive query execution by default and GA
>   status for the ANSI SQL language mode.
> 
> - We updated the Apache Spark homepage with a new design and more examples.
> 
> - We added one new committer, Chao Sun, in October.
> 
> Trademarks:
> 
> - No changes since the last report.
> 
> Latest releases:
> 
> - Spark 3.2.0 was released on October 13, 2021.
> - Spark 3.1.2 was released on June 23rd, 2021.
> - Spark 3.0.3 was released on June 1st, 2021.
> 
> Committers and PMC:
> 
> - The latest committers was added on November 5th, 2021 (Chao Sun).
> - The latest PMC member was added on June 20th, 2021 (Kousuke Saruta).
> 
> ===
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org 
> 
>