Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-10 Thread Khalid Mammadov
+1 On Fri, 10 Nov 2023, 15:23 Peter Toth, wrote: > +1 > > On Fri, Nov 10, 2023, 14:09 Bjørn Jørgensen > wrote: > >> +1 >> >> fre. 10. nov. 2023 kl. 08:39 skrev Nan Zhu : >> >>> just curious what happened on google’s spark operator? >>> >>> On Thu, Nov 9, 2023 at 19:12 Ilan Filonenko wrote:

r-project.org is down

2023-10-27 Thread Khalid Mammadov
Hi devs Just heads up, *r-project.org is down* and may affect builds if infra image cache needs rebuild. Not sure who needs to fix this Cheers Khalid

Re: Slack for Spark Community: Merging various threads

2023-04-06 Thread Khalid Mammadov
Hi, I have been following this thread and want to add my 2 cents. Apache Airflow project got Slack channel and it's self service! I joined it and it costs me zero to use it where as one Mich send requires payed subscription. IMHO, it will be major barrier for people to join just ask a question. I

Re: Missing string replace function

2022-10-02 Thread Khalid Mammadov
+ > > On Oct 2, 2022, at 12:21 PM, Russell Spitzer > wrote: > > https://spark.apache.org/docs/3.3.0/api/sql/index.html#replace > > This was added in Spark 2.3.0 as far as I can tell. > > https://github.com/apache/spark/pull/18047 > > On Oct 2, 2022, at 11:19 AM, Khal

Missing string replace function

2022-10-02 Thread Khalid Mammadov
Hi, As you know there's no string "replace" function inside pyspark.sql.functions for PySpark nor in org.apache.sql.functions for Scala/Java and was wondering why is that so? And I know there's regexp_replace instead and na.replace or SQL with expr. I think it's one of the fundamental functions

Re: Contributions and help needed in SPARK-40005

2022-08-30 Thread Khalid Mammadov
Will do, thanks! On Wed, 31 Aug 2022, 01:14 Hyukjin Kwon, wrote: > Oh, that's a mistake. please just go ahead and reuse that JIRA :-). > You can just create a PR with reusing the same JIRA ID for functions.py > > On Wed, 31 Aug 2022 at 01:18, Khalid Mammadov > wrote: > &

Re: Contributions and help needed in SPARK-40005

2022-08-30 Thread Khalid Mammadov
Hi @Hyukjin Kwon I see you have resolved the JIRA and I got some more things to do in functions.py (only done 50%). So shall I create a new JIRA for each new PR or ok to reuse this one? On Fri, 19 Aug 2022, 09:29 Khalid Mammadov, wrote: > Will do, thanks! > > On Fri, 19 Aug 20

Re: Contributions and help needed in SPARK-40005

2022-08-19 Thread Khalid Mammadov
L][FOLLOW-UP] Make pyspark.sql.functions > examples self-contained (part 2, 25 functions) > > Thanks! > > On Fri, 19 Aug 2022 at 16:50, Khalid Mammadov > wrote: > >> I am picking up "functions.py" if noone is already >> >> On Fri, 19 Aug 2022, 07:56 Khalid Mammadov,

Re: Contributions and help needed in SPARK-40005

2022-08-19 Thread Khalid Mammadov
I am picking up "functions.py" if noone is already On Fri, 19 Aug 2022, 07:56 Khalid Mammadov, wrote: > I thought it's all finished (checked few). Do you have list of those 50%? > Happy to contribute  > > On Fri, 19 Aug 2022, 05:54 Hyukjin Kwon, wrote: > >> We

Re: Contributions and help needed in SPARK-40005

2022-08-19 Thread Khalid Mammadov
t; I would like to do some work and pick up *Window.py *if possible. >>>> >>>> Thanks, >>>> Qian >>>> >>>> Hyukjin Kwon 于2022年8月9日周二 10:41写道: >>>> >>>>> Thanks Khalid for taking a look. >>>>>

Re: Contributions and help needed in SPARK-40005

2022-08-08 Thread Khalid Mammadov
Hi Hyukjin That's great initiative, here is a PR that address one of those issues that's waiting for review: https://github.com/apache/spark/pull/37408 Perhaps, it would be also good to track these pending issues somewhere to avoid effort duplication. For example, I would like to pick up *union*

Re: docker image distribution in Kubernetes cluster

2021-12-08 Thread Khalid Mammadov
Hi Mitch IMO, it's done to provide most flexibility. So, some users can have limited/restricted version of the image or with some additional software that they use on the executors that is used during processing. So, in your case you only need to provide the first one since the other two configs

Re: Performance of PySpark jobs on the Kubernetes cluster

2021-08-10 Thread Khalid Mammadov
Hi Mich I think you need to check your code. If code does not use PySpark API effectively you may get this. I.e. if you use pure Python/pandas api rather than Pyspark i.e. transform->transform->action. e.g df.select(..).withColumn(...)...count() Hope this helps to put you on right direction.