Re: Slack for PySpark users

2023-03-30 Thread Jungtaek Lim
I'm reading through the page "Briefing: The Apache Way", and in the section
of "Open Communications", restriction of communication inside ASF INFRA
(mailing list) is more about code and decision-making.
https://www.apache.org/theapacheway/#what-makes-the-apache-way-so-hard-to-define

It's unavoidable if "users" prefer to use an alternative communication
mechanism rather than the user mailing list. Before Stack Overflow days,
there had been a meaningful number of questions around user@. It's just
impossible to let them go back and post to the user mailing list.

We just need to make sure it is not the purpose of employing Slack to move
all discussions about developments, direction of the project, etc which
must happen in dev@/private@. The purpose of Slack thread here does not
seem to aim to serve the purpose.


On Fri, Mar 31, 2023 at 7:00 AM Mich Talebzadeh 
wrote:

> Good discussions and proposals.all around.
>
> I have used slack in anger on a customer site before. For small and medium
> size groups it is good and affordable. Alternatives have been suggested as
> well so those who like investigative search can agree and come up with a
> freebie one.
> I am inclined to agree with Bjorn that this slack has more social
> dimensions than the mailing list. It is akin to a sports club using
> WhatsApp groups for communication. Remember we were originally looking for
> space for webinars, including Spark on Linkedin that Denney Lee suggested.
> I think Slack and mailing groups can coexist happily. On a more serious
> note, when I joined the user group back in 2015-2016, there was a lot of
> traffic. Currently we hardly get many mails daily <> less than 5. So having
> a slack type medium may improve members participation.
>
> so +1 for me as well.
>
> Mich Talebzadeh,
> Lead Solutions Architect/Engineering Lead
> Palantir Technologies Limited
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Thu, 30 Mar 2023 at 22:19, Denny Lee  wrote:
>
>> +1.
>>
>> To Shani’s point, there are multiple OSS projects that use the free Slack
>> version - top of mind include Delta, Presto, Flink, Trino, Datahub, MLflow,
>> etc.
>>
>> On Thu, Mar 30, 2023 at 14:15  wrote:
>>
>>> Hey everyone,
>>>
>>> I think we should remain on a free program in slack.
>>>
>>> In my option the free program is more then enough, the only down side is
>>> we could only see the last 90 days messages.
>>>
>>> From what I know the Airflow community (which has strong active
>>> community in slack) also use the free program (You can tell by the 90 days
>>> limit notice in their workspace).
>>>
>>> You can find the pricing and features comparison between the slack
>>> programs here  .
>>>
>>> Have a great day,
>>> Shani
>>>
>>> On 30 Mar 2023, at 23:38, Mridul Muralidharan  wrote:
>>>
>>> 
>>>
>>>
>>> Thanks for flagging the concern Dongjoon, I was not aware of the
>>> discussion - but I can understand the concern.
>>> Would be great if you or Matei could update the thread on the result of
>>> deliberations, once it reaches a logical consensus: before we set up
>>> official policy around it.
>>>
>>> Regards,
>>> Mridul
>>>
>>>
>>> On Thu, Mar 30, 2023 at 4:23 PM Bjørn Jørgensen <
>>> bjornjorgen...@gmail.com> wrote:
>>>
 I like the idea of having a talk channel. It can make it easier for
 everyone to say hello. Or to dare to ask about small or big matters that
 you would not have dared to ask about before on mailing lists.
 But then there is the price and what is the best for an open source
 project.

 The price for using slack is expensive.
 Right now for those that have join spark slack
 $8.75 USD
 72 members
 1 month
 $630 USD

 https://app.slack.com/plans/T04URTRBZ1R/checkout/form?entry_point=hero_banner_upgrade_cta=2

 And they - slack does not have an option for open source projects.

 There seems to be some alternatives for open source software. I have
 not tried it.
 Like https://www.rocket.chat/blog/slack-open-source-alternatives

 


 rocket chat is open source https://github.com/RocketChat/Rocket.Chat

 tor. 30. mar. 2023 kl. 18:54 skrev Mich Talebzadeh <
 mich.talebza...@gmail.com>:

> Hi Dongjoon
>
> to your points if I may
>
> - Do you have any reference from other official ASF-related Slack
> channels?
>No, I don't have any reference from other official ASF-related
> Slack channels because I don't think 

Re: Creating InMemory relations with data in ColumnarBatches

2023-03-30 Thread Mich Talebzadeh
Is this purely for performance consideration?

Mich Talebzadeh,
Lead Solutions Architect/Engineering Lead
Palantir Technologies Limited


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 30 Mar 2023 at 19:56, praveen sinha  wrote:

> Hi,
>
> I have been trying to implement InMemoryRelation based on spark
> ColumnarBatches, so far I have not been able to store the vectorised
> columnarbatch into the relation. Is there a way to achieve this without
> going with an intermediary representation like Arrow, so as to enable spark
> to do fast columnar aggregations in memory. The code so far, using just the
> high level APIs is as follows -
>
> ```
>   //Load csv into Datafram
>   val csvDF: DataFrame = context.sqlctx.read
> .format("com.databricks.spark.csv")
> .option("header", "true")
> .option("inferSchema", "true")
> .load(csvFile)
>
>   //Create in memory relation using schema from csv dataframe
>   val relation = InMemoryRelation(
> useCompression = true,
> batchSize = 100,
> storageLevel = StorageLevel.MEMORY_ONLY,
> child = csvDF.queryExecution.sparkPlan, //Do I need to alter this
> to suggest columnar plans?
> tableName = Some("nyc_taxi"),
> optimizedPlan = csvDF.queryExecution.optimizedPlan
>   )
>
>   //create vectorized columnar batches
>   val rows = csvDF.collect()
>   import scala.collection.JavaConverters._
>   val vectorizedRows: ColumnarBatch =
> ColumnVectorUtils.toBatch(csvDF.schema, MemoryMode.ON_HEAP,
> rows.iterator.asJava)
>
>   //store the vectorized rows in the relation
>   //relation.store(vectorizedRows)
> ```
>
> Obviously the last line is the one which is not an API. Need help to
> understand if this approach can work and if it does, need help and pointers
> in trying to come up with how to implement this API using low level spark
> constructs.
>
> Thanks and Regards,
> Praveen
>


Re: Slack for PySpark users

2023-03-30 Thread Mich Talebzadeh
Good discussions and proposals.all around.

I have used slack in anger on a customer site before. For small and medium
size groups it is good and affordable. Alternatives have been suggested as
well so those who like investigative search can agree and come up with a
freebie one.
I am inclined to agree with Bjorn that this slack has more social
dimensions than the mailing list. It is akin to a sports club using
WhatsApp groups for communication. Remember we were originally looking for
space for webinars, including Spark on Linkedin that Denney Lee suggested.
I think Slack and mailing groups can coexist happily. On a more serious
note, when I joined the user group back in 2015-2016, there was a lot of
traffic. Currently we hardly get many mails daily <> less than 5. So having
a slack type medium may improve members participation.

so +1 for me as well.

Mich Talebzadeh,
Lead Solutions Architect/Engineering Lead
Palantir Technologies Limited


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 30 Mar 2023 at 22:19, Denny Lee  wrote:

> +1.
>
> To Shani’s point, there are multiple OSS projects that use the free Slack
> version - top of mind include Delta, Presto, Flink, Trino, Datahub, MLflow,
> etc.
>
> On Thu, Mar 30, 2023 at 14:15  wrote:
>
>> Hey everyone,
>>
>> I think we should remain on a free program in slack.
>>
>> In my option the free program is more then enough, the only down side is
>> we could only see the last 90 days messages.
>>
>> From what I know the Airflow community (which has strong active community
>> in slack) also use the free program (You can tell by the 90 days limit
>> notice in their workspace).
>>
>> You can find the pricing and features comparison between the slack
>> programs here  .
>>
>> Have a great day,
>> Shani
>>
>> On 30 Mar 2023, at 23:38, Mridul Muralidharan  wrote:
>>
>> 
>>
>>
>> Thanks for flagging the concern Dongjoon, I was not aware of the
>> discussion - but I can understand the concern.
>> Would be great if you or Matei could update the thread on the result of
>> deliberations, once it reaches a logical consensus: before we set up
>> official policy around it.
>>
>> Regards,
>> Mridul
>>
>>
>> On Thu, Mar 30, 2023 at 4:23 PM Bjørn Jørgensen 
>> wrote:
>>
>>> I like the idea of having a talk channel. It can make it easier for
>>> everyone to say hello. Or to dare to ask about small or big matters that
>>> you would not have dared to ask about before on mailing lists.
>>> But then there is the price and what is the best for an open source
>>> project.
>>>
>>> The price for using slack is expensive.
>>> Right now for those that have join spark slack
>>> $8.75 USD
>>> 72 members
>>> 1 month
>>> $630 USD
>>>
>>> https://app.slack.com/plans/T04URTRBZ1R/checkout/form?entry_point=hero_banner_upgrade_cta=2
>>>
>>> And they - slack does not have an option for open source projects.
>>>
>>> There seems to be some alternatives for open source software. I have not
>>> tried it.
>>> Like https://www.rocket.chat/blog/slack-open-source-alternatives
>>>
>>> 
>>>
>>>
>>> rocket chat is open source https://github.com/RocketChat/Rocket.Chat
>>>
>>> tor. 30. mar. 2023 kl. 18:54 skrev Mich Talebzadeh <
>>> mich.talebza...@gmail.com>:
>>>
 Hi Dongjoon

 to your points if I may

 - Do you have any reference from other official ASF-related Slack
 channels?
No, I don't have any reference from other official ASF-related Slack
 channels because I don't think that matters. However, I stand corrected
 - To be clear, I intentionally didn't refer to any specific mailing
 list because we didn't set up any rule here yet.
fair enough

 going back to your original point

 ..There is a concern expressed by ASF board because recent Slack
 activities created an isolated silo outside of ASF mailing list archive...
 Well, there are activities on Spark and indeed other open source
 software everywhere. One way or other they do help getting community
 (inside the user groups and other) to get interested and involved. Slack
 happens to be one of them.
 I am of the opinion that creating such silos is already a reality and
 we ought to be pragmatic. Unless there is an overriding reason, we should
 embrace it as slack can co-exist with the other mailing lists and channels
 like linkedin etc.

 Hope this clarifies my position

 Mich Talebzadeh,
 Lead Solutions Architect/Engineering Lead
 Palantir Technologies Limited


Re: Slack for PySpark users

2023-03-30 Thread Denny Lee
+1.

To Shani’s point, there are multiple OSS projects that use the free Slack
version - top of mind include Delta, Presto, Flink, Trino, Datahub, MLflow,
etc.

On Thu, Mar 30, 2023 at 14:15  wrote:

> Hey everyone,
>
> I think we should remain on a free program in slack.
>
> In my option the free program is more then enough, the only down side is
> we could only see the last 90 days messages.
>
> From what I know the Airflow community (which has strong active community
> in slack) also use the free program (You can tell by the 90 days limit
> notice in their workspace).
>
> You can find the pricing and features comparison between the slack
> programs here  .
>
> Have a great day,
> Shani
>
> On 30 Mar 2023, at 23:38, Mridul Muralidharan  wrote:
>
> 
>
>
> Thanks for flagging the concern Dongjoon, I was not aware of the
> discussion - but I can understand the concern.
> Would be great if you or Matei could update the thread on the result of
> deliberations, once it reaches a logical consensus: before we set up
> official policy around it.
>
> Regards,
> Mridul
>
>
> On Thu, Mar 30, 2023 at 4:23 PM Bjørn Jørgensen 
> wrote:
>
>> I like the idea of having a talk channel. It can make it easier for
>> everyone to say hello. Or to dare to ask about small or big matters that
>> you would not have dared to ask about before on mailing lists.
>> But then there is the price and what is the best for an open source
>> project.
>>
>> The price for using slack is expensive.
>> Right now for those that have join spark slack
>> $8.75 USD
>> 72 members
>> 1 month
>> $630 USD
>>
>> https://app.slack.com/plans/T04URTRBZ1R/checkout/form?entry_point=hero_banner_upgrade_cta=2
>>
>> And they - slack does not have an option for open source projects.
>>
>> There seems to be some alternatives for open source software. I have not
>> tried it.
>> Like https://www.rocket.chat/blog/slack-open-source-alternatives
>>
>> 
>>
>>
>> rocket chat is open source https://github.com/RocketChat/Rocket.Chat
>>
>> tor. 30. mar. 2023 kl. 18:54 skrev Mich Talebzadeh <
>> mich.talebza...@gmail.com>:
>>
>>> Hi Dongjoon
>>>
>>> to your points if I may
>>>
>>> - Do you have any reference from other official ASF-related Slack
>>> channels?
>>>No, I don't have any reference from other official ASF-related Slack
>>> channels because I don't think that matters. However, I stand corrected
>>> - To be clear, I intentionally didn't refer to any specific mailing list
>>> because we didn't set up any rule here yet.
>>>fair enough
>>>
>>> going back to your original point
>>>
>>> ..There is a concern expressed by ASF board because recent Slack
>>> activities created an isolated silo outside of ASF mailing list archive...
>>> Well, there are activities on Spark and indeed other open source
>>> software everywhere. One way or other they do help getting community
>>> (inside the user groups and other) to get interested and involved. Slack
>>> happens to be one of them.
>>> I am of the opinion that creating such silos is already a reality and we
>>> ought to be pragmatic. Unless there is an overriding reason, we should
>>> embrace it as slack can co-exist with the other mailing lists and channels
>>> like linkedin etc.
>>>
>>> Hope this clarifies my position
>>>
>>> Mich Talebzadeh,
>>> Lead Solutions Architect/Engineering Lead
>>> Palantir Technologies Limited
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Thu, 30 Mar 2023 at 17:28, Dongjoon Hyun 
>>> wrote:
>>>
 To Mich.
 - Do you have any reference from other official ASF-related Slack
 channels?
 - To be clear, I intentionally didn't refer to any specific mailing
 list because we didn't set up any rule here yet.

 To Xiao. I understand what you mean. That's the reason why I added
 Matei from your side.
 > I did not see an objection from the ASF board.

 There is on-going discussion about the communication channels outside
 ASF email which is specifically concerning Slack.
 Please hold on any official action for this topic. We will know how to
 support it seamlessly.

 Dongjoon.


 On Thu, Mar 30, 2023 at 9:21 AM Xiao Li  wrote:

> Hi, Dongjoon,
>
> The other communities (e.g., Pinot, Druid, Flink) created their own
> Slack workspaces last year. I did not see an objection from the ASF board.
> At the same time, Slack workspaces are very popular and useful in most
> non-ASF 

Re: Slack for PySpark users

2023-03-30 Thread shani . alishar
Hey everyone,I think we should remain on a free program in slack.In my option the free program is more then enough, the only down side is we could only see the last 90 days messages.From what I know the Airflow community (which has strong active community in slack) also use the free program (You can tell by the 90 days limit notice in their workspace).You can find the pricing and features comparison between the slack programs here .Have a great day,ShaniOn 30 Mar 2023, at 23:38, Mridul Muralidharan  wrote:Thanks for flagging the concern Dongjoon, I was not aware of the discussion - but I can understand the concern.Would be great if you or Matei could update the thread on the result of deliberations, once it reaches a logical consensus: before we set up official policy around it.Regards,MridulOn Thu, Mar 30, 2023 at 4:23 PM Bjørn Jørgensen  wrote:I like the idea of having a talk channel. It can make it easier for everyone to say hello. Or to dare to ask about small or big matters that you would not have dared to ask about before on mailing lists. But then there is the price and what is the best for an open source project.The price for using slack is expensive. Right now for those that have join spark slack $8.75 USD72 members1 month$630 USD https://app.slack.com/plans/T04URTRBZ1R/checkout/form?entry_point=hero_banner_upgrade_cta=2And they - slack does not have an option for open source projects.  There seems to be some alternatives for open source software. I have not tried it. Like https://www.rocket.chat/blog/slack-open-source-alternativesrocket chat is open source https://github.com/RocketChat/Rocket.Chat tor. 30. mar. 2023 kl. 18:54 skrev Mich Talebzadeh :Hi Dongjoon to your points if I may- Do you have any reference from other official ASF-related Slack channels?   No, I don't have any reference from other official ASF-related Slack channels because I don't think that matters. However, I stand corrected- To be clear, I intentionally didn't refer to any specific mailing list because we didn't set up any rule here yet.   fair enoughgoing back to your original point..There is a concern expressed by ASF board because recent Slack activities created an isolated silo outside of ASF mailing list archive...Well, there are activities on Spark and indeed other open source software everywhere. One way or other they do help getting community (inside the user groups and other) to get interested and involved. Slack happens to be one of them. I am of the opinion that creating such silos is already a reality and we ought to be pragmatic. Unless there is an overriding reason, we should embrace it as slack can co-exist with the other mailing lists and channels like linkedin etc.Hope this clarifies my position

Mich Talebzadeh,Lead Solutions Architect/Engineering LeadPalantir Technologies Limited

   view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh

 Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction
of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from such
loss, damage or destruction.  

On Thu, 30 Mar 2023 at 17:28, Dongjoon Hyun  wrote:To Mich.- Do you have any reference from other official ASF-related Slack channels?- To be clear, I intentionally didn't refer to any specific mailing list because we didn't set up any rule here yet.To Xiao. I understand what you mean. That's the reason why I added Matei from your side.> I did not see an objection from the ASF board. There is on-going discussion about the communication channels outside ASF email which is specifically concerning Slack.Please hold on any official action for this topic. We will know how to support it seamlessly.Dongjoon.On Thu, Mar 30, 2023 at 9:21 AM Xiao Li  wrote:Hi, Dongjoon, The other communities (e.g., Pinot, Druid, Flink) created their own Slack workspaces last year. I did not see an objection from the ASF board. At the same time, Slack workspaces are very popular and useful in most non-ASF open source communities. TBH, we are kind of late. I think we can do the same in our community?  We can follow the guide when the ASF has an official process for ASF archiving. Since our PMC are the owner of the slack workspace, we can make a change based on the policy. WDYT? XiaoDongjoon Hyun  于2023年3月30日周四 09:03写道:Hi, Xiao and all.(cc Matei)Please hold on the vote.There is a concern expressed by ASF board because recent Slack activities created an isolated silo outside of ASF mailing list archive.We need to establish a way to embrace it back to ASF archive before starting anything official.Bests,Dongjoon.On Wed, Mar 29, 2023 at 11:32 PM Xiao Li  wrote:+1 + @d...@spark.apache.org This is a good idea. The 

Re: Slack for PySpark users

2023-03-30 Thread Mridul Muralidharan
Thanks for flagging the concern Dongjoon, I was not aware of the discussion
- but I can understand the concern.
Would be great if you or Matei could update the thread on the result of
deliberations, once it reaches a logical consensus: before we set up
official policy around it.

Regards,
Mridul


On Thu, Mar 30, 2023 at 4:23 PM Bjørn Jørgensen 
wrote:

> I like the idea of having a talk channel. It can make it easier for
> everyone to say hello. Or to dare to ask about small or big matters that
> you would not have dared to ask about before on mailing lists.
> But then there is the price and what is the best for an open source
> project.
>
> The price for using slack is expensive.
> Right now for those that have join spark slack
> $8.75 USD
> 72 members
> 1 month
> $630 USD
>
> https://app.slack.com/plans/T04URTRBZ1R/checkout/form?entry_point=hero_banner_upgrade_cta=2
>
> And they - slack does not have an option for open source projects.
>
> There seems to be some alternatives for open source software. I have not
> tried it.
> Like https://www.rocket.chat/blog/slack-open-source-alternatives
>
> [image: image.png]
>
> rocket chat is open source https://github.com/RocketChat/Rocket.Chat
>
> tor. 30. mar. 2023 kl. 18:54 skrev Mich Talebzadeh <
> mich.talebza...@gmail.com>:
>
>> Hi Dongjoon
>>
>> to your points if I may
>>
>> - Do you have any reference from other official ASF-related Slack
>> channels?
>>No, I don't have any reference from other official ASF-related Slack
>> channels because I don't think that matters. However, I stand corrected
>> - To be clear, I intentionally didn't refer to any specific mailing list
>> because we didn't set up any rule here yet.
>>fair enough
>>
>> going back to your original point
>>
>> ..There is a concern expressed by ASF board because recent Slack
>> activities created an isolated silo outside of ASF mailing list archive...
>> Well, there are activities on Spark and indeed other open source software
>> everywhere. One way or other they do help getting community (inside the
>> user groups and other) to get interested and involved. Slack happens to be
>> one of them.
>> I am of the opinion that creating such silos is already a reality and we
>> ought to be pragmatic. Unless there is an overriding reason, we should
>> embrace it as slack can co-exist with the other mailing lists and channels
>> like linkedin etc.
>>
>> Hope this clarifies my position
>>
>> Mich Talebzadeh,
>> Lead Solutions Architect/Engineering Lead
>> Palantir Technologies Limited
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Thu, 30 Mar 2023 at 17:28, Dongjoon Hyun 
>> wrote:
>>
>>> To Mich.
>>> - Do you have any reference from other official ASF-related Slack
>>> channels?
>>> - To be clear, I intentionally didn't refer to any specific mailing list
>>> because we didn't set up any rule here yet.
>>>
>>> To Xiao. I understand what you mean. That's the reason why I added Matei
>>> from your side.
>>> > I did not see an objection from the ASF board.
>>>
>>> There is on-going discussion about the communication channels outside
>>> ASF email which is specifically concerning Slack.
>>> Please hold on any official action for this topic. We will know how to
>>> support it seamlessly.
>>>
>>> Dongjoon.
>>>
>>>
>>> On Thu, Mar 30, 2023 at 9:21 AM Xiao Li  wrote:
>>>
 Hi, Dongjoon,

 The other communities (e.g., Pinot, Druid, Flink) created their own
 Slack workspaces last year. I did not see an objection from the ASF board.
 At the same time, Slack workspaces are very popular and useful in most
 non-ASF open source communities. TBH, we are kind of late. I think we can
 do the same in our community?

 We can follow the guide when the ASF has an official process for ASF
 archiving. Since our PMC are the owner of the slack workspace, we can make
 a change based on the policy. WDYT?

 Xiao


 Dongjoon Hyun  于2023年3月30日周四 09:03写道:

> Hi, Xiao and all.
>
> (cc Matei)
>
> Please hold on the vote.
>
> There is a concern expressed by ASF board because recent Slack
> activities created an isolated silo outside of ASF mailing list archive.
>
> We need to establish a way to embrace it back to ASF archive before
> starting anything official.
>
> Bests,
> Dongjoon.
>
>
>
> On Wed, Mar 29, 2023 at 11:32 PM Xiao Li  wrote:
>
>> +1
>>
>> + @d...@spark.apache.org 
>>
>> This is a good 

unsubscribe

2023-03-30 Thread Daniel Tavares de Santana
unsubscribe


Re: Slack for PySpark users

2023-03-30 Thread Bjørn Jørgensen
I like the idea of having a talk channel. It can make it easier for
everyone to say hello. Or to dare to ask about small or big matters that
you would not have dared to ask about before on mailing lists.
But then there is the price and what is the best for an open source project.

The price for using slack is expensive.
Right now for those that have join spark slack
$8.75 USD
72 members
1 month
$630 USD
https://app.slack.com/plans/T04URTRBZ1R/checkout/form?entry_point=hero_banner_upgrade_cta=2

And they - slack does not have an option for open source projects.

There seems to be some alternatives for open source software. I have not
tried it.
Like https://www.rocket.chat/blog/slack-open-source-alternatives

[image: image.png]

rocket chat is open source https://github.com/RocketChat/Rocket.Chat

tor. 30. mar. 2023 kl. 18:54 skrev Mich Talebzadeh <
mich.talebza...@gmail.com>:

> Hi Dongjoon
>
> to your points if I may
>
> - Do you have any reference from other official ASF-related Slack channels?
>No, I don't have any reference from other official ASF-related Slack
> channels because I don't think that matters. However, I stand corrected
> - To be clear, I intentionally didn't refer to any specific mailing list
> because we didn't set up any rule here yet.
>fair enough
>
> going back to your original point
>
> ..There is a concern expressed by ASF board because recent Slack
> activities created an isolated silo outside of ASF mailing list archive...
> Well, there are activities on Spark and indeed other open source software
> everywhere. One way or other they do help getting community (inside the
> user groups and other) to get interested and involved. Slack happens to be
> one of them.
> I am of the opinion that creating such silos is already a reality and we
> ought to be pragmatic. Unless there is an overriding reason, we should
> embrace it as slack can co-exist with the other mailing lists and channels
> like linkedin etc.
>
> Hope this clarifies my position
>
> Mich Talebzadeh,
> Lead Solutions Architect/Engineering Lead
> Palantir Technologies Limited
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Thu, 30 Mar 2023 at 17:28, Dongjoon Hyun 
> wrote:
>
>> To Mich.
>> - Do you have any reference from other official ASF-related Slack
>> channels?
>> - To be clear, I intentionally didn't refer to any specific mailing list
>> because we didn't set up any rule here yet.
>>
>> To Xiao. I understand what you mean. That's the reason why I added Matei
>> from your side.
>> > I did not see an objection from the ASF board.
>>
>> There is on-going discussion about the communication channels outside ASF
>> email which is specifically concerning Slack.
>> Please hold on any official action for this topic. We will know how to
>> support it seamlessly.
>>
>> Dongjoon.
>>
>>
>> On Thu, Mar 30, 2023 at 9:21 AM Xiao Li  wrote:
>>
>>> Hi, Dongjoon,
>>>
>>> The other communities (e.g., Pinot, Druid, Flink) created their own
>>> Slack workspaces last year. I did not see an objection from the ASF board.
>>> At the same time, Slack workspaces are very popular and useful in most
>>> non-ASF open source communities. TBH, we are kind of late. I think we can
>>> do the same in our community?
>>>
>>> We can follow the guide when the ASF has an official process for ASF
>>> archiving. Since our PMC are the owner of the slack workspace, we can make
>>> a change based on the policy. WDYT?
>>>
>>> Xiao
>>>
>>>
>>> Dongjoon Hyun  于2023年3月30日周四 09:03写道:
>>>
 Hi, Xiao and all.

 (cc Matei)

 Please hold on the vote.

 There is a concern expressed by ASF board because recent Slack
 activities created an isolated silo outside of ASF mailing list archive.

 We need to establish a way to embrace it back to ASF archive before
 starting anything official.

 Bests,
 Dongjoon.



 On Wed, Mar 29, 2023 at 11:32 PM Xiao Li  wrote:

> +1
>
> + @d...@spark.apache.org 
>
> This is a good idea. The other Apache projects (e.g., Pinot, Druid,
> Flink) have created their own dedicated Slack workspaces for faster
> communication. We can do the same in Apache Spark. The Slack workspace 
> will
> be maintained by the Apache Spark PMC. I propose to initiate a vote for 
> the
> creation of a new Apache Spark Slack workspace. Does that sound good?
>
> Cheers,
>
> Xiao
>
>
>
>
>
>
>
> Mich Talebzadeh  于2023年3月28日周二 07:07写道:
>
>> I 

Creating InMemory relations with data in ColumnarBatches

2023-03-30 Thread praveen sinha
Hi,

I have been trying to implement InMemoryRelation based on spark
ColumnarBatches, so far I have not been able to store the vectorised
columnarbatch into the relation. Is there a way to achieve this without
going with an intermediary representation like Arrow, so as to enable spark
to do fast columnar aggregations in memory. The code so far, using just the
high level APIs is as follows -

```
  //Load csv into Datafram
  val csvDF: DataFrame = context.sqlctx.read
.format("com.databricks.spark.csv")
.option("header", "true")
.option("inferSchema", "true")
.load(csvFile)

  //Create in memory relation using schema from csv dataframe
  val relation = InMemoryRelation(
useCompression = true,
batchSize = 100,
storageLevel = StorageLevel.MEMORY_ONLY,
child = csvDF.queryExecution.sparkPlan, //Do I need to alter this
to suggest columnar plans?
tableName = Some("nyc_taxi"),
optimizedPlan = csvDF.queryExecution.optimizedPlan
  )

  //create vectorized columnar batches
  val rows = csvDF.collect()
  import scala.collection.JavaConverters._
  val vectorizedRows: ColumnarBatch =
ColumnVectorUtils.toBatch(csvDF.schema, MemoryMode.ON_HEAP,
rows.iterator.asJava)

  //store the vectorized rows in the relation
  //relation.store(vectorizedRows)
```

Obviously the last line is the one which is not an API. Need help to
understand if this approach can work and if it does, need help and pointers
in trying to come up with how to implement this API using low level spark
constructs.

Thanks and Regards,
Praveen


Re: Slack for PySpark users

2023-03-30 Thread Mich Talebzadeh
Hi Dongjoon

to your points if I may

- Do you have any reference from other official ASF-related Slack channels?
   No, I don't have any reference from other official ASF-related Slack
channels because I don't think that matters. However, I stand corrected
- To be clear, I intentionally didn't refer to any specific mailing list
because we didn't set up any rule here yet.
   fair enough

going back to your original point

..There is a concern expressed by ASF board because recent Slack activities
created an isolated silo outside of ASF mailing list archive...
Well, there are activities on Spark and indeed other open source software
everywhere. One way or other they do help getting community (inside the
user groups and other) to get interested and involved. Slack happens to be
one of them.
I am of the opinion that creating such silos is already a reality and we
ought to be pragmatic. Unless there is an overriding reason, we should
embrace it as slack can co-exist with the other mailing lists and channels
like linkedin etc.

Hope this clarifies my position

Mich Talebzadeh,
Lead Solutions Architect/Engineering Lead
Palantir Technologies Limited


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 30 Mar 2023 at 17:28, Dongjoon Hyun  wrote:

> To Mich.
> - Do you have any reference from other official ASF-related Slack channels?
> - To be clear, I intentionally didn't refer to any specific mailing list
> because we didn't set up any rule here yet.
>
> To Xiao. I understand what you mean. That's the reason why I added Matei
> from your side.
> > I did not see an objection from the ASF board.
>
> There is on-going discussion about the communication channels outside ASF
> email which is specifically concerning Slack.
> Please hold on any official action for this topic. We will know how to
> support it seamlessly.
>
> Dongjoon.
>
>
> On Thu, Mar 30, 2023 at 9:21 AM Xiao Li  wrote:
>
>> Hi, Dongjoon,
>>
>> The other communities (e.g., Pinot, Druid, Flink) created their own Slack
>> workspaces last year. I did not see an objection from the ASF board. At the
>> same time, Slack workspaces are very popular and useful in most non-ASF
>> open source communities. TBH, we are kind of late. I think we can do the
>> same in our community?
>>
>> We can follow the guide when the ASF has an official process for ASF
>> archiving. Since our PMC are the owner of the slack workspace, we can make
>> a change based on the policy. WDYT?
>>
>> Xiao
>>
>>
>> Dongjoon Hyun  于2023年3月30日周四 09:03写道:
>>
>>> Hi, Xiao and all.
>>>
>>> (cc Matei)
>>>
>>> Please hold on the vote.
>>>
>>> There is a concern expressed by ASF board because recent Slack
>>> activities created an isolated silo outside of ASF mailing list archive.
>>>
>>> We need to establish a way to embrace it back to ASF archive before
>>> starting anything official.
>>>
>>> Bests,
>>> Dongjoon.
>>>
>>>
>>>
>>> On Wed, Mar 29, 2023 at 11:32 PM Xiao Li  wrote:
>>>
 +1

 + @d...@spark.apache.org 

 This is a good idea. The other Apache projects (e.g., Pinot, Druid,
 Flink) have created their own dedicated Slack workspaces for faster
 communication. We can do the same in Apache Spark. The Slack workspace will
 be maintained by the Apache Spark PMC. I propose to initiate a vote for the
 creation of a new Apache Spark Slack workspace. Does that sound good?

 Cheers,

 Xiao







 Mich Talebzadeh  于2023年3月28日周二 07:07写道:

> I created one at slack called pyspark
>
>
> Mich Talebzadeh,
> Lead Solutions Architect/Engineering Lead
> Palantir Technologies Limited
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any loss, damage or destruction of data or any other property which may
> arise from relying on this email's technical content is explicitly
> disclaimed. The author will in no case be liable for any monetary damages
> arising from such loss, damage or destruction.
>
>
>
>
> On Tue, 28 Mar 2023 at 03:52, asma zgolli 
> wrote:
>
>> +1 good idea, I d like to join as well.
>>
>> Le mar. 28 mars 2023 à 04:09, Winston Lai  a
>> écrit :
>>
>>> Please let us know when the channel is created. I'd like to join :)
>>>
>>> Thank You & Best Regards
>>> Winston Lai

Re: Slack for PySpark users

2023-03-30 Thread Dongjoon Hyun
To Mich.
- Do you have any reference from other official ASF-related Slack channels?
- To be clear, I intentionally didn't refer to any specific mailing list
because we didn't set up any rule here yet.

To Xiao. I understand what you mean. That's the reason why I added Matei
from your side.
> I did not see an objection from the ASF board.

There is on-going discussion about the communication channels outside ASF
email which is specifically concerning Slack.
Please hold on any official action for this topic. We will know how to
support it seamlessly.

Dongjoon.


On Thu, Mar 30, 2023 at 9:21 AM Xiao Li  wrote:

> Hi, Dongjoon,
>
> The other communities (e.g., Pinot, Druid, Flink) created their own Slack
> workspaces last year. I did not see an objection from the ASF board. At the
> same time, Slack workspaces are very popular and useful in most non-ASF
> open source communities. TBH, we are kind of late. I think we can do the
> same in our community?
>
> We can follow the guide when the ASF has an official process for ASF
> archiving. Since our PMC are the owner of the slack workspace, we can make
> a change based on the policy. WDYT?
>
> Xiao
>
>
> Dongjoon Hyun  于2023年3月30日周四 09:03写道:
>
>> Hi, Xiao and all.
>>
>> (cc Matei)
>>
>> Please hold on the vote.
>>
>> There is a concern expressed by ASF board because recent Slack activities
>> created an isolated silo outside of ASF mailing list archive.
>>
>> We need to establish a way to embrace it back to ASF archive before
>> starting anything official.
>>
>> Bests,
>> Dongjoon.
>>
>>
>>
>> On Wed, Mar 29, 2023 at 11:32 PM Xiao Li  wrote:
>>
>>> +1
>>>
>>> + @d...@spark.apache.org 
>>>
>>> This is a good idea. The other Apache projects (e.g., Pinot, Druid,
>>> Flink) have created their own dedicated Slack workspaces for faster
>>> communication. We can do the same in Apache Spark. The Slack workspace will
>>> be maintained by the Apache Spark PMC. I propose to initiate a vote for the
>>> creation of a new Apache Spark Slack workspace. Does that sound good?
>>>
>>> Cheers,
>>>
>>> Xiao
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Mich Talebzadeh  于2023年3月28日周二 07:07写道:
>>>
 I created one at slack called pyspark


 Mich Talebzadeh,
 Lead Solutions Architect/Engineering Lead
 Palantir Technologies Limited


view my Linkedin profile
 


  https://en.everybodywiki.com/Mich_Talebzadeh



 *Disclaimer:* Use it at your own risk. Any and all responsibility for
 any loss, damage or destruction of data or any other property which may
 arise from relying on this email's technical content is explicitly
 disclaimed. The author will in no case be liable for any monetary damages
 arising from such loss, damage or destruction.




 On Tue, 28 Mar 2023 at 03:52, asma zgolli  wrote:

> +1 good idea, I d like to join as well.
>
> Le mar. 28 mars 2023 à 04:09, Winston Lai  a
> écrit :
>
>> Please let us know when the channel is created. I'd like to join :)
>>
>> Thank You & Best Regards
>> Winston Lai
>> --
>> *From:* Denny Lee 
>> *Sent:* Tuesday, March 28, 2023 9:43:08 AM
>> *To:* Hyukjin Kwon 
>> *Cc:* keen ; user@spark.apache.org <
>> user@spark.apache.org>
>> *Subject:* Re: Slack for PySpark users
>>
>> +1 I think this is a great idea!
>>
>> On Mon, Mar 27, 2023 at 6:24 PM Hyukjin Kwon 
>> wrote:
>>
>> Yeah, actually I think we should better have a slack channel so we
>> can easily discuss with users and developers.
>>
>> On Tue, 28 Mar 2023 at 03:08, keen  wrote:
>>
>> Hi all,
>> I really like *Slack *as communication channel for a tech community.
>> There is a Slack workspace for *delta lake users* (
>> https://go.delta.io/slack) that I enjoy a lot.
>> I was wondering if there is something similar for PySpark users.
>>
>> If not, would there be anything wrong with creating a new
>> Slack workspace for PySpark users? (when explicitly mentioning that this 
>> is
>> *not* officially part of Apache Spark)?
>>
>> Cheers
>> Martin
>>
>>
>
> --
> Asma ZGOLLI
>
> Ph.D. in Big Data - Applied Machine Learning
>
>


Re: Slack for PySpark users

2023-03-30 Thread Xiao Li
Hi, Dongjoon,

The other communities (e.g., Pinot, Druid, Flink) created their own Slack
workspaces last year. I did not see an objection from the ASF board. At the
same time, Slack workspaces are very popular and useful in most non-ASF
open source communities. TBH, we are kind of late. I think we can do the
same in our community?

We can follow the guide when the ASF has an official process for ASF
archiving. Since our PMC are the owner of the slack workspace, we can make
a change based on the policy. WDYT?

Xiao


Dongjoon Hyun  于2023年3月30日周四 09:03写道:

> Hi, Xiao and all.
>
> (cc Matei)
>
> Please hold on the vote.
>
> There is a concern expressed by ASF board because recent Slack activities
> created an isolated silo outside of ASF mailing list archive.
>
> We need to establish a way to embrace it back to ASF archive before
> starting anything official.
>
> Bests,
> Dongjoon.
>
>
>
> On Wed, Mar 29, 2023 at 11:32 PM Xiao Li  wrote:
>
>> +1
>>
>> + @d...@spark.apache.org 
>>
>> This is a good idea. The other Apache projects (e.g., Pinot, Druid,
>> Flink) have created their own dedicated Slack workspaces for faster
>> communication. We can do the same in Apache Spark. The Slack workspace will
>> be maintained by the Apache Spark PMC. I propose to initiate a vote for the
>> creation of a new Apache Spark Slack workspace. Does that sound good?
>>
>> Cheers,
>>
>> Xiao
>>
>>
>>
>>
>>
>>
>>
>> Mich Talebzadeh  于2023年3月28日周二 07:07写道:
>>
>>> I created one at slack called pyspark
>>>
>>>
>>> Mich Talebzadeh,
>>> Lead Solutions Architect/Engineering Lead
>>> Palantir Technologies Limited
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Tue, 28 Mar 2023 at 03:52, asma zgolli  wrote:
>>>
 +1 good idea, I d like to join as well.

 Le mar. 28 mars 2023 à 04:09, Winston Lai  a
 écrit :

> Please let us know when the channel is created. I'd like to join :)
>
> Thank You & Best Regards
> Winston Lai
> --
> *From:* Denny Lee 
> *Sent:* Tuesday, March 28, 2023 9:43:08 AM
> *To:* Hyukjin Kwon 
> *Cc:* keen ; user@spark.apache.org <
> user@spark.apache.org>
> *Subject:* Re: Slack for PySpark users
>
> +1 I think this is a great idea!
>
> On Mon, Mar 27, 2023 at 6:24 PM Hyukjin Kwon 
> wrote:
>
> Yeah, actually I think we should better have a slack channel so we can
> easily discuss with users and developers.
>
> On Tue, 28 Mar 2023 at 03:08, keen  wrote:
>
> Hi all,
> I really like *Slack *as communication channel for a tech community.
> There is a Slack workspace for *delta lake users* (
> https://go.delta.io/slack) that I enjoy a lot.
> I was wondering if there is something similar for PySpark users.
>
> If not, would there be anything wrong with creating a new
> Slack workspace for PySpark users? (when explicitly mentioning that this 
> is
> *not* officially part of Apache Spark)?
>
> Cheers
> Martin
>
>

 --
 Asma ZGOLLI

 Ph.D. in Big Data - Applied Machine Learning




Re: Slack for PySpark users

2023-03-30 Thread Mich Talebzadeh
Hi Dongjoon,

Thanks for your point.

I gather you are referring to archive as below

https://lists.apache.org/list.html?user@spark.apache.org

Otherwise, correct me.

Thanks


Mich Talebzadeh,
Lead Solutions Architect/Engineering Lead
Palantir Technologies Limited


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 30 Mar 2023 at 17:03, Dongjoon Hyun  wrote:

> Hi, Xiao and all.
>
> (cc Matei)
>
> Please hold on the vote.
>
> There is a concern expressed by ASF board because recent Slack activities
> created an isolated silo outside of ASF mailing list archive.
>
> We need to establish a way to embrace it back to ASF archive before
> starting anything official.
>
> Bests,
> Dongjoon.
>
>
>
> On Wed, Mar 29, 2023 at 11:32 PM Xiao Li  wrote:
>
>> +1
>>
>> + @d...@spark.apache.org 
>>
>> This is a good idea. The other Apache projects (e.g., Pinot, Druid,
>> Flink) have created their own dedicated Slack workspaces for faster
>> communication. We can do the same in Apache Spark. The Slack workspace will
>> be maintained by the Apache Spark PMC. I propose to initiate a vote for the
>> creation of a new Apache Spark Slack workspace. Does that sound good?
>>
>> Cheers,
>>
>> Xiao
>>
>>
>>
>>
>>
>>
>>
>> Mich Talebzadeh  于2023年3月28日周二 07:07写道:
>>
>>> I created one at slack called pyspark
>>>
>>>
>>> Mich Talebzadeh,
>>> Lead Solutions Architect/Engineering Lead
>>> Palantir Technologies Limited
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Tue, 28 Mar 2023 at 03:52, asma zgolli  wrote:
>>>
 +1 good idea, I d like to join as well.

 Le mar. 28 mars 2023 à 04:09, Winston Lai  a
 écrit :

> Please let us know when the channel is created. I'd like to join :)
>
> Thank You & Best Regards
> Winston Lai
> --
> *From:* Denny Lee 
> *Sent:* Tuesday, March 28, 2023 9:43:08 AM
> *To:* Hyukjin Kwon 
> *Cc:* keen ; user@spark.apache.org <
> user@spark.apache.org>
> *Subject:* Re: Slack for PySpark users
>
> +1 I think this is a great idea!
>
> On Mon, Mar 27, 2023 at 6:24 PM Hyukjin Kwon 
> wrote:
>
> Yeah, actually I think we should better have a slack channel so we can
> easily discuss with users and developers.
>
> On Tue, 28 Mar 2023 at 03:08, keen  wrote:
>
> Hi all,
> I really like *Slack *as communication channel for a tech community.
> There is a Slack workspace for *delta lake users* (
> https://go.delta.io/slack) that I enjoy a lot.
> I was wondering if there is something similar for PySpark users.
>
> If not, would there be anything wrong with creating a new
> Slack workspace for PySpark users? (when explicitly mentioning that this 
> is
> *not* officially part of Apache Spark)?
>
> Cheers
> Martin
>
>

 --
 Asma ZGOLLI

 Ph.D. in Big Data - Applied Machine Learning




Re: Slack for PySpark users

2023-03-30 Thread Dongjoon Hyun
Hi, Xiao and all.

(cc Matei)

Please hold on the vote.

There is a concern expressed by ASF board because recent Slack activities
created an isolated silo outside of ASF mailing list archive.

We need to establish a way to embrace it back to ASF archive before
starting anything official.

Bests,
Dongjoon.



On Wed, Mar 29, 2023 at 11:32 PM Xiao Li  wrote:

> +1
>
> + @d...@spark.apache.org 
>
> This is a good idea. The other Apache projects (e.g., Pinot, Druid, Flink)
> have created their own dedicated Slack workspaces for faster communication.
> We can do the same in Apache Spark. The Slack workspace will be maintained
> by the Apache Spark PMC. I propose to initiate a vote for the creation of a
> new Apache Spark Slack workspace. Does that sound good?
>
> Cheers,
>
> Xiao
>
>
>
>
>
>
>
> Mich Talebzadeh  于2023年3月28日周二 07:07写道:
>
>> I created one at slack called pyspark
>>
>>
>> Mich Talebzadeh,
>> Lead Solutions Architect/Engineering Lead
>> Palantir Technologies Limited
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Tue, 28 Mar 2023 at 03:52, asma zgolli  wrote:
>>
>>> +1 good idea, I d like to join as well.
>>>
>>> Le mar. 28 mars 2023 à 04:09, Winston Lai  a
>>> écrit :
>>>
 Please let us know when the channel is created. I'd like to join :)

 Thank You & Best Regards
 Winston Lai
 --
 *From:* Denny Lee 
 *Sent:* Tuesday, March 28, 2023 9:43:08 AM
 *To:* Hyukjin Kwon 
 *Cc:* keen ; user@spark.apache.org <
 user@spark.apache.org>
 *Subject:* Re: Slack for PySpark users

 +1 I think this is a great idea!

 On Mon, Mar 27, 2023 at 6:24 PM Hyukjin Kwon 
 wrote:

 Yeah, actually I think we should better have a slack channel so we can
 easily discuss with users and developers.

 On Tue, 28 Mar 2023 at 03:08, keen  wrote:

 Hi all,
 I really like *Slack *as communication channel for a tech community.
 There is a Slack workspace for *delta lake users* (
 https://go.delta.io/slack) that I enjoy a lot.
 I was wondering if there is something similar for PySpark users.

 If not, would there be anything wrong with creating a new
 Slack workspace for PySpark users? (when explicitly mentioning that this is
 *not* officially part of Apache Spark)?

 Cheers
 Martin


>>>
>>> --
>>> Asma ZGOLLI
>>>
>>> Ph.D. in Big Data - Applied Machine Learning
>>>
>>>


[ANNOUNCE] Apache Celeborn(incubating) 0.2.1 available

2023-03-30 Thread rexxiong
Hi all,

Apache Celeborn(Incubating) community is glad to announce the
new release of Apache Celeborn(Incubating) 0.2.1

Celeborn is dedicated to improving the efficiency and elasticity of
different map-reduce engines and provides an elastic, high-efficient
service for intermediate data including shuffle data, spilled data,
result data, etc.
Currently Celeborn supports Spark full-featured and improves Spark
job's performance, stability and elasticity.

Download Link: https://celeborn.apache.org/download/

GitHub Release Tag:
-
https://github.com/apache/incubator-celeborn/releases/tag/v0.2.1-incubating-rc0

Release Notes:
- https://celeborn.apache.org/community/release_notes/release_note_0.2.1

Website: https://celeborn.apache.org/

Celeborn Resources:
- Issue: https://issues.apache.org/jira/projects/CELEBORN
- Mailing list: d...@celeborn.apache.org

rexxiong
On behalf of the Apache Celeborn(incubating) community


Re: Slack for PySpark users

2023-03-30 Thread Mich Talebzadeh
The ownership of slack belongs to spark community

Mich Talebzadeh,
Lead Solutions Architect/Engineering Lead
Palantir Technologies Limited


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 30 Mar 2023 at 08:05,  wrote:

> Hey there,
>
> I agree, If Apache Spark PMC can maintain the spark community workspace,
> that would be great!
> Instead of creating a new one, they can also become the owner of the
> current one
> 
>  .
>
> Best regards,
> Shani
>
> On 30 Mar 2023, at 9:32, Xiao Li  wrote:
>
> 
> +1
>
> + @d...@spark.apache.org 
>
> This is a good idea. The other Apache projects (e.g., Pinot, Druid, Flink)
> have created their own dedicated Slack workspaces for faster communication.
> We can do the same in Apache Spark. The Slack workspace will be maintained
> by the Apache Spark PMC. I propose to initiate a vote for the creation of a
> new Apache Spark Slack workspace. Does that sound good?
>
> Cheers,
>
> Xiao
>
>
>
>
>
>
>
> Mich Talebzadeh  于2023年3月28日周二 07:07写道:
>
>> I created one at slack called pyspark
>>
>>
>> Mich Talebzadeh,
>> Lead Solutions Architect/Engineering Lead
>> Palantir Technologies Limited
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Tue, 28 Mar 2023 at 03:52, asma zgolli  wrote:
>>
>>> +1 good idea, I d like to join as well.
>>>
>>> Le mar. 28 mars 2023 à 04:09, Winston Lai  a
>>> écrit :
>>>
 Please let us know when the channel is created. I'd like to join :)

 Thank You & Best Regards
 Winston Lai
 --
 *From:* Denny Lee 
 *Sent:* Tuesday, March 28, 2023 9:43:08 AM
 *To:* Hyukjin Kwon 
 *Cc:* keen ; user@spark.apache.org <
 user@spark.apache.org>
 *Subject:* Re: Slack for PySpark users

 +1 I think this is a great idea!

 On Mon, Mar 27, 2023 at 6:24 PM Hyukjin Kwon 
 wrote:

 Yeah, actually I think we should better have a slack channel so we can
 easily discuss with users and developers.

 On Tue, 28 Mar 2023 at 03:08, keen  wrote:

 Hi all,
 I really like *Slack *as communication channel for a tech community.
 There is a Slack workspace for *delta lake users* (
 https://go.delta.io/slack) that I enjoy a lot.
 I was wondering if there is something similar for PySpark users.

 If not, would there be anything wrong with creating a new
 Slack workspace for PySpark users? (when explicitly mentioning that this is
 *not* officially part of Apache Spark)?

 Cheers
 Martin


>>>
>>> --
>>> Asma ZGOLLI
>>>
>>> Ph.D. in Big Data - Applied Machine Learning
>>>
>>>


Re: Slack for PySpark users

2023-03-30 Thread Mich Talebzadeh
We already have it

general - Apache Spark Community - Slack


Mich Talebzadeh,
Lead Solutions Architect/Engineering Lead
Palantir Technologies Limited


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 30 Mar 2023 at 07:31, Xiao Li  wrote:

> +1
>
> + @d...@spark.apache.org 
>
> This is a good idea. The other Apache projects (e.g., Pinot, Druid, Flink)
> have created their own dedicated Slack workspaces for faster communication.
> We can do the same in Apache Spark. The Slack workspace will be maintained
> by the Apache Spark PMC. I propose to initiate a vote for the creation of a
> new Apache Spark Slack workspace. Does that sound good?
>
> Cheers,
>
> Xiao
>
>
>
>
>
>
>
> Mich Talebzadeh  于2023年3月28日周二 07:07写道:
>
>> I created one at slack called pyspark
>>
>>
>> Mich Talebzadeh,
>> Lead Solutions Architect/Engineering Lead
>> Palantir Technologies Limited
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Tue, 28 Mar 2023 at 03:52, asma zgolli  wrote:
>>
>>> +1 good idea, I d like to join as well.
>>>
>>> Le mar. 28 mars 2023 à 04:09, Winston Lai  a
>>> écrit :
>>>
 Please let us know when the channel is created. I'd like to join :)

 Thank You & Best Regards
 Winston Lai
 --
 *From:* Denny Lee 
 *Sent:* Tuesday, March 28, 2023 9:43:08 AM
 *To:* Hyukjin Kwon 
 *Cc:* keen ; user@spark.apache.org <
 user@spark.apache.org>
 *Subject:* Re: Slack for PySpark users

 +1 I think this is a great idea!

 On Mon, Mar 27, 2023 at 6:24 PM Hyukjin Kwon 
 wrote:

 Yeah, actually I think we should better have a slack channel so we can
 easily discuss with users and developers.

 On Tue, 28 Mar 2023 at 03:08, keen  wrote:

 Hi all,
 I really like *Slack *as communication channel for a tech community.
 There is a Slack workspace for *delta lake users* (
 https://go.delta.io/slack) that I enjoy a lot.
 I was wondering if there is something similar for PySpark users.

 If not, would there be anything wrong with creating a new
 Slack workspace for PySpark users? (when explicitly mentioning that this is
 *not* officially part of Apache Spark)?

 Cheers
 Martin


>>>
>>> --
>>> Asma ZGOLLI
>>>
>>> Ph.D. in Big Data - Applied Machine Learning
>>>
>>>


Re: Slack for PySpark users

2023-03-30 Thread shani . alishar
Hey there,I agree, If Apache Spark PMC can maintain the spark community workspace, that would be great!Instead of creating a new one, they can also become the owner of the current one .Best regards,ShaniOn 30 Mar 2023, at 9:32, Xiao Li  wrote:+1 + @d...@spark.apache.org This is a good idea. The other Apache projects (e.g., Pinot, Druid, Flink) have created their own dedicated Slack workspaces for faster communication. We can do the same in Apache Spark. The Slack workspace will be maintained by the Apache Spark PMC. I propose to initiate a vote for the creation of a new Apache Spark Slack workspace. Does that sound good?Cheers,Xiao   Mich Talebzadeh  于2023年3月28日周二 07:07写道:I created one at slack called pyspark 

Mich Talebzadeh,Lead Solutions Architect/Engineering LeadPalantir Technologies Limited

   view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh

 Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction
of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from such
loss, damage or destruction.  

On Tue, 28 Mar 2023 at 03:52, asma zgolli  wrote:+1 good idea, I d like to join as well.Le mar. 28 mars 2023 à 04:09, Winston Lai  a écrit :




Please let us know when the channel is created. I'd like to join :)


Thank You & Best Regards
Winston Lai


From: Denny Lee 
Sent: Tuesday, March 28, 2023 9:43:08 AM
To: Hyukjin Kwon 
Cc: keen ; user@spark.apache.org 
Subject: Re: Slack for PySpark users
 


+1 I think this is a great idea!



On Mon, Mar 27, 2023 at 6:24 PM Hyukjin Kwon  wrote:


Yeah, actually I think we should better have a slack channel so we can easily discuss with users and developers.


On Tue, 28 Mar 2023 at 03:08, keen  wrote:


Hi all, 
I really like Slack as communication channel for a tech community.
There is a Slack workspace for delta lake users (https://go.delta.io/slack) that I enjoy a lot.  
I was wondering if there is something similar for PySpark users.

If not, would there be anything wrong with creating a new Slack workspace for PySpark users? (when explicitly mentioning that this is
not officially part of Apache Spark)?


Cheers
Martin








-- Asma ZGOLLIPh.D. in Big Data - Applied Machine Learning




Re: Slack for PySpark users

2023-03-30 Thread Xiao Li
+1

+ @d...@spark.apache.org 

This is a good idea. The other Apache projects (e.g., Pinot, Druid, Flink)
have created their own dedicated Slack workspaces for faster communication.
We can do the same in Apache Spark. The Slack workspace will be maintained
by the Apache Spark PMC. I propose to initiate a vote for the creation of a
new Apache Spark Slack workspace. Does that sound good?

Cheers,

Xiao







Mich Talebzadeh  于2023年3月28日周二 07:07写道:

> I created one at slack called pyspark
>
>
> Mich Talebzadeh,
> Lead Solutions Architect/Engineering Lead
> Palantir Technologies Limited
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 28 Mar 2023 at 03:52, asma zgolli  wrote:
>
>> +1 good idea, I d like to join as well.
>>
>> Le mar. 28 mars 2023 à 04:09, Winston Lai  a
>> écrit :
>>
>>> Please let us know when the channel is created. I'd like to join :)
>>>
>>> Thank You & Best Regards
>>> Winston Lai
>>> --
>>> *From:* Denny Lee 
>>> *Sent:* Tuesday, March 28, 2023 9:43:08 AM
>>> *To:* Hyukjin Kwon 
>>> *Cc:* keen ; user@spark.apache.org <
>>> user@spark.apache.org>
>>> *Subject:* Re: Slack for PySpark users
>>>
>>> +1 I think this is a great idea!
>>>
>>> On Mon, Mar 27, 2023 at 6:24 PM Hyukjin Kwon 
>>> wrote:
>>>
>>> Yeah, actually I think we should better have a slack channel so we can
>>> easily discuss with users and developers.
>>>
>>> On Tue, 28 Mar 2023 at 03:08, keen  wrote:
>>>
>>> Hi all,
>>> I really like *Slack *as communication channel for a tech community.
>>> There is a Slack workspace for *delta lake users* (
>>> https://go.delta.io/slack) that I enjoy a lot.
>>> I was wondering if there is something similar for PySpark users.
>>>
>>> If not, would there be anything wrong with creating a new
>>> Slack workspace for PySpark users? (when explicitly mentioning that this is
>>> *not* officially part of Apache Spark)?
>>>
>>> Cheers
>>> Martin
>>>
>>>
>>
>> --
>> Asma ZGOLLI
>>
>> Ph.D. in Big Data - Applied Machine Learning
>>
>>