Re: What is the best way to organize a join within a foreach?

2023-04-26 Thread Mich Talebzadeh
gt; > Thanks for your help team, > Marco. > > On Wed, Apr 26, 2023 at 6:21 AM Mich Talebzadeh > wrote: > >> Indeed very valid points by Ayan. How email is going to handle 1000s of >> records. As a solution architect I tend to replace. Users by customers and >> for ea

Re: What is the best way to organize a join within a foreach?

2023-04-26 Thread Mich Talebzadeh
instead, I suggest to use a > notification service or function. Spark should write to a queue (kafka, > sqs...pick your choice here). > > Best regards > Ayan > > On Wed, 26 Apr 2023 at 7:01 pm, Mich Talebzadeh > wrote: > >> Well OK in a nutshell you want the result se

Re: What is the best way to organize a join within a foreach?

2023-04-26 Thread Mich Talebzadeh
of the first ETL. How does this differ from using forEach? Performance wise forEach may not be optimal. Can you take the sample tables and try your method? HTH Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited London United Kingdom view my Linkedin profile

Re: What is the best way to organize a join within a foreach?

2023-04-25 Thread Mich Talebzadeh
1| |Mich| 50004| Mich's 4th order|104.11| 104.11| |Mich| 50005| Mich's 5th order|105.11| 105.11| |Mich| 50006| Mich's 6th order|106.11| 106.11| |Mich| 50007| Mich's 7th order|107.11| 107.11| |Mich| 50008| Mich's 8th order|108.11| 108.11| |Mich| 50009| Mich's 9th order|109

Re: What is the best way to organize a join within a foreach?

2023-04-25 Thread Mich Talebzadeh
as comma separated csv file HTH Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer

Re: What is the best way to organize a join within a foreach?

2023-04-25 Thread Mich Talebzadeh
Have you thought of using windowing function <https://sparkbyexamples.com/spark/spark-sql-window-functions/>s to achieve this? Effectively all your information is in the orders table. HTH Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited London

Re: Spark Kubernetes Operator

2023-04-14 Thread Mich Talebzadeh
Hi, What exactly are you trying to achieve? Spark on GKE works fine and you can run Datapoc now on GKE https://www.linkedin.com/pulse/running-google-dataproc-kubernetes-engine-gke-spark-mich/?trackingId=lz12GC5dRFasLiaJm5qDSw%3D%3D Unless I misunderstood your point. HTH Mich Talebzadeh, Lead

Re: Accessing python runner file in AWS EKS kubernetes cluster as in local://

2023-04-14 Thread Mich Talebzadeh
*spark-on-aws *in http://sparkcommunitytalk.slack.com/ HTH Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywi

Re: Accessing python runner file in AWS EKS kubernetes cluster as in local://

2023-04-12 Thread Mich Talebzadeh
Thanks! I will have a look. Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer

Accessing python runner file in AWS EKS kubernetes cluster as in local://

2023-04-12 Thread Mich Talebzadeh
Is that a correct assumption? Thanks Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *

Re: Re: spark streaming and kinesis integration

2023-04-12 Thread Mich Talebzadeh
Hi Lingzhe Sun, Thanks for your comments. I am afraid I won't be able to take part in this project and contribute. HTH Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/m

Re: spark streaming and kinesis integration

2023-04-10 Thread Mich Talebzadeh
is more suitable (as of now) for batch jobs than Spark Structured Streaming. https://issues.apache.org/jira/browse/SPARK-12133 Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/m

Re: spark streaming and kinesis integration

2023-04-10 Thread Mich Talebzadeh
Structured Streaming is supported on k8s. HTH Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *

Re: spark streaming and kinesis integration

2023-04-06 Thread Mich Talebzadeh
Do you have a high level diagram of the proposed solution? In so far as I know k8s does not support spark structured streaming? Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/m

Re: spark streaming and kinesis integration

2023-04-06 Thread Mich Talebzadeh
Hi Rajesh, What is the use case for Kinesis here? I have not used it personally, Which use case it concerns https://aws.amazon.com/kinesis/ Can you use something else instead? HTH Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies London United Kingdom view

Re: Potability of dockers built on different cloud platforms

2023-04-05 Thread Mich Talebzadeh
tps://cloud.google.com/container-registry>r and ecr <https://docs.aws.amazon.com/AmazonECR/latest/userguide/Registries.html> (container registries) HTH Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies London United Kingdom view my Linkedin profile <https://w

Re: Troubleshooting ArrayIndexOutOfBoundsException in long running Spark application

2023-04-05 Thread Mich Talebzadeh
OK Spark Structured Streaming. How are you getting messages into Spark? Is it Kafka? This to me index that the message is incomplete or having another value in Json HTH Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies London United Kingdom view my Linkedin

Re: Slack for PySpark users

2023-04-04 Thread Mich Talebzadeh
s not the best solution or is it > just that the link does not work. > > tir. 4. apr. 2023 kl. 09:06 skrev Mich Talebzadeh < > mich.talebza...@gmail.com>: > >> Hi Shani, >> >> I believe I am an admin so that is fine by me. >> >> Hi Dongioon, >> >

Re: Slack for PySpark users

2023-04-04 Thread Mich Talebzadeh
Hi Shani, I believe I am an admin so that is fine by me. Hi Dongioon, With regard to summarising the discussion etc, no need, It is like flogging the dead horse, we have already discussed it enough. I don't see the point of it. HTH Mich Talebzadeh, Lead Solutions Architect/Engineering Lead

Re: Slack for PySpark users

2023-04-03 Thread Mich Talebzadeh
I agree, whatever individual sentiments are. Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer

Re: Slack for PySpark users

2023-04-03 Thread Mich Talebzadeh
I for myself prefer to use the newly formed slack. sparkcommunitytalk.slack.com In summary, it may be a good idea to take a tour of it and see for yourself. Topics are sectioned as per user requests. I trust this answers your question. Mich Talebzadeh, Lead Solutions Architect/Engineering Lead

Re: Looping through a series of telephone numbers

2023-04-02 Thread Mich Talebzadeh
-in-spark/#:~:text=Broadcast%20join%20is%20an%20optimization,always%20collected%20at%20the%20driver . HTH Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>

Re: Looping through a series of telephone numbers

2023-04-02 Thread Mich Talebzadeh
sqlContext.sql("JOIN Query").show If you prefer to broadcast the reference data, you must first collect it on the driver before you broadcast it. This requires that your RDD fits in memory on your driver (and executors). You can then play around with that join. HTH Mich Talebzadeh, L

Re: Looping through a series of telephone numbers

2023-04-01 Thread Mich Talebzadeh
This may help Spark rlike() Working with Regex Matching Example <https://sparkbyexamples.com/spark/spark-rlike-regex-matching-examples/>s Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited view my Linkedin profile <https://www.linkedin.co

Re: Help me learn about JOB TASK and DAG in Apache Spark

2023-04-01 Thread Mich Talebzadeh
ection. HTH Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all re

Re: Help me learn about JOB TASK and DAG in Apache Spark

2023-03-31 Thread Mich Talebzadeh
yes history refers to completed jobs. 4040 is the running jobs you should have screen shots for executors and stages as well. HTH Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited view my Linkedin profile <https://www.linkedin.com/in/mich-talebza

Re: Help me learn about JOB TASK and DAG in Apache Spark

2023-03-31 Thread Mich Talebzadeh
Are you familiar with spark GUI default on port 4040? have a look. HTH Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywi

Re: Creating InMemory relations with data in ColumnarBatches

2023-03-30 Thread Mich Talebzadeh
Is this purely for performance consideration? Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer

Re: Slack for PySpark users

2023-03-30 Thread Mich Talebzadeh
can coexist happily. On a more serious note, when I joined the user group back in 2015-2016, there was a lot of traffic. Currently we hardly get many mails daily <> less than 5. So having a slack type medium may improve members participation. so +1 for me as well. Mich Talebzadeh, Lead Sol

Re: Slack for PySpark users

2023-03-30 Thread Mich Talebzadeh
. Unless there is an overriding reason, we should embrace it as slack can co-exist with the other mailing lists and channels like linkedin etc. Hope this clarifies my position Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited view my Linkedin profile <ht

Re: Slack for PySpark users

2023-03-30 Thread Mich Talebzadeh
Hi Dongjoon, Thanks for your point. I gather you are referring to archive as below https://lists.apache.org/list.html?user@spark.apache.org Otherwise, correct me. Thanks Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited view my Linkedin profile

Re: Slack for PySpark users

2023-03-30 Thread Mich Talebzadeh
The ownership of slack belongs to spark community Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer

Re: Slack for PySpark users

2023-03-30 Thread Mich Talebzadeh
We already have it general - Apache Spark Community - Slack <https://app.slack.com/client/T04URTRBZ1R/C0501NBTNQG/thread/C050F0J5YNA-1680070839.296179> Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited view my Linkedin profile <https://www.lin

Re: Topics for Spark online classes & webinars

2023-03-28 Thread Mich Talebzadeh
https://join.slack.com/t/sparkcommunitytalk/shared_invite/zt-1rk11diac-hzGbOEdBHgjXf02IZ1mvUA Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>

Re: Topics for Spark online classes & webinars

2023-03-28 Thread Mich Talebzadeh
Hi Bjorn, you just need to create an account on slack and join any topic I believe HTH Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>

Re: Topics for Spark online classes & webinars

2023-03-28 Thread Mich Talebzadeh
sers. -- Spark internals and/or comparing spark 3 and 2 -- Spark Streaming & Spark Structured Streaming -- Spark on notebooks -- Spark on serverless (for example Spark on Google Cloud) -- Spark on k8s If you are willing to contribute to presentation materials, please register your interest in slack/webinar

Re: Slack for PySpark users

2023-03-28 Thread Mich Talebzadeh
I created one at slack called pyspark Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it a

Re: Question related to asynchronously map transformation using java spark structured streaming

2023-03-26 Thread Mich Talebzadeh
\ . --conf "spark.driver.memory"=4G \ --conf "spark.executor.memory"=4G \ --conf "spark.num.executors"=4 \ --conf "spark.executor.cores"=2 \ HTH Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Pa

Re: Adding OpenSearch as a secondary index provider to SparkSQL

2023-03-24 Thread Mich Talebzadeh
Hi, Are you talking about intelligent index scan here? Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disc

Re: Question related to parallelism using structed streaming parallelism

2023-03-21 Thread Mich Talebzadeh
or download it from here https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>

Topics for Spark online classes & webinars, next steps

2023-03-21 Thread Mich Talebzadeh
or @Denny Lee an email stating which topic and at what level you would like to take part. We propose to do a peer review of the draft presentation so no worries. Looking forward to hearing from you. HTH Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited

Re: Spark StructuredStreaming - watermark not working as expected

2023-03-17 Thread Mich Talebzadeh
Hi Karan, The version tested was 3.1.1. Are you running on Dataproc serverless 3.1.3? Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>

Re: Topics for Spark online classes & webinars

2023-03-15 Thread Mich Talebzadeh
Understood Nitin It would be wrong to act against one's conviction. I am sure we can find a way around providing the contents Regards Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited view my Linkedin profile <https://www.linkedin.com/in/m

Re: Topics for Spark online classes & webinars

2023-03-15 Thread Mich Talebzadeh
there as well. Best of luck. Mich Talebzadeh, Lead Solutions Architect/Engineering Lead, Palantir Technologies Limited view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any a

Re: Topics for Spark online classes & webinars

2023-03-15 Thread Mich Talebzadeh
and contributions are welcome. HTH Mich Talebzadeh, Lead Solutions Architect/Engineering Lead, Palantir Technologies Limited view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at yo

Re: Question related to parallelism using structed streaming parallelism

2023-03-14 Thread Mich Talebzadeh
tps://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is expl

Re: Question related to parallelism using structed streaming parallelism

2023-03-14 Thread Mich Talebzadeh
e, defined by Spark GUI as Time taken to process all jobs of a batch. *The **Scheduling Dela*y and *the **Total Dela*y are additional indicators of health. then decide how to set the value. HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205

Re: Question related to parallelism using structed streaming parallelism

2023-03-14 Thread Mich Talebzadeh
What benefits are you going with increasing parallelism? Better througput view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility for an

Re: Topics for Spark online classes & webinars

2023-03-14 Thread Mich Talebzadeh
Hi Denny, That Apache Spark Linkedin page https://www.linkedin.com/company/apachespark/ looks fine. It also allows a wider audience to benefit from it. +1 for me view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywi

Re: Topics for Spark online classes & webinars

2023-03-13 Thread Mich Talebzadeh
Well that needs to be created first for this purpose. The appropriate name etc. to be decided. Maybe @Denny Lee can facilitate this as he offered his help. cheers view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywi

Re: Topics for Spark online classes & webinars

2023-03-13 Thread Mich Talebzadeh
to is welcome view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which ma

Topics for Spark online classes & webinars

2023-03-13 Thread Mich Talebzadeh
Hi guys To move forward I selected these topics from the thread "Online classes for spark topics". To take this further I propose a confluence page to be seup. Opinions and how to is welcome Cheers view my Linkedin profile <https://www.linkedin.com/in/mich-talebzade

Re: org.apache.spark.shuffle.FetchFailedException in dataproc

2023-03-13 Thread Mich Talebzadeh
tps://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical c

Re: Online classes for spark topics

2023-03-12 Thread Mich Talebzadeh
page for Spark so we can use it. I guess that would be part of the structure you mentioned. HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsi

Re: Spark StructuredStreaming - watermark not working as expected

2023-03-12 Thread Mich Talebzadeh
\ outputMode('complete'). \ option("numRows", 1000). \ option("truncate", "false"). \ format('console'). \ option('checkpointLocation', checkpoint_path). \ queryName

Re: What could be the cause of an execution freeze on Hadoop for small datasets?

2023-03-11 Thread Mich Talebzadeh
and processors across multiple executors on multiple nodes. HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destr

Re: What could be the cause of an execution freeze on Hadoop for small datasets?

2023-03-11 Thread Mich Talebzadeh
... To note that if I execute collectAsList on the dataset at the beginning of the program What do you think collectAsList does? view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer

Re: Spark StructuredStreaming - watermark not working as expected

2023-03-10 Thread Mich Talebzadeh
.withColumnRenamed('sum(sentOctets)', 'sentOctets') \ .withColumnRenamed('sum(recvdOctets)', 'recvdOctets') \ .fillna(0) What does ldf.printSchema() returns HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> http

Re: org.apache.spark.shuffle.FetchFailedException in dataproc

2023-03-10 Thread Mich Talebzadeh
for your dataproc what type of machines are you using for example n2-standard-4 with 4vCPU and 16GB or something else? how many nodes and if autoscaling turned on. most likely executor memory limit? HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-520

Re: How to share a dataset file across nodes

2023-03-09 Thread Mich Talebzadeh
rk.csv").option("inferSchema", "true").option("header", "true").load(csv_file) listing_df.printSchema() HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh

Re: read a binary file and save in another location

2023-03-09 Thread Mich Talebzadeh
Does this need any action in PySpark? How about importing using the shutil package? https://sparkbyexamples.com/python/how-to-copy-files-in-python/ view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Tale

Re: [Spark Structured Streaming] Could we apply new options of readStream/writeStream without stopping spark application (zero downtime)?

2023-03-09 Thread Mich Talebzadeh
needs to be preserved. HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other pr

Re: Online classes for spark topics

2023-03-09 Thread Mich Talebzadeh
a draft list of topics of interest and share them in the forum to get the priority order. Well that is my thoughts. Cheers view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at yo

Re: Online classes for spark topics

2023-03-08 Thread Mich Talebzadeh
Hi, I guess I can schedule this work over a course of time. I for myself can contribute plus learn from others. So +1 for me. Let us see if anyone else is interested. HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywi

Re: Online classes for spark topics

2023-03-07 Thread Mich Talebzadeh
. Anyone else 樂 HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property whi

Re: [Spark Structured Streaming] Could we apply new options of readStream/writeStream without stopping spark application (zero downtime)?

2023-03-07 Thread Mich Talebzadeh
m on the case so to speak. There is a considerable interest in Spark Structured Streaming across the board, especially in trading systems. HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it a

Re: [Spark Structured Streaming] Do spark structured streaming is support sink to AWS Kinesis currently and how to handle if achieve quotas of kinesis?

2023-03-06 Thread Mich Talebzadeh
) are default but can be negotiated with the vendor.to increase it. What facts have you established so far? HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any a

Re: How to pass variables across functions in spark structured streaming (PySpark)

2023-03-05 Thread Mich Talebzadeh
9 ERROR streaming.MicroBatchExecution: Query newtopic [id = 19f4c6ad-11b8-451f-acf1-8bfbea7c370b, runId = dd26db7d-f4bf-4176-ae75-116eb67eb237] terminated with error HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/

Re: How to pass variables across functions in spark structured streaming (PySpark)

2023-03-04 Thread Mich Talebzadeh
Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage

Re: How to pass variables across functions in spark structured streaming (PySpark)

2023-03-04 Thread Mich Talebzadeh
ig['MDVariables']['targetTable']) df.unpersist() #print(f"""wrote to DB""") batchidMD = batchId print(batchidMD) else: print("DataFrame md is empty") I trust I explained it adequately cheers view my Linkedi

Re: How to pass variables across functions in spark structured streaming (PySpark)

2023-03-04 Thread Mich Talebzadeh
Thanks. they are different batchIds >From sendToControl, newtopic batchId is 76 >From sendToSink, md, batchId is 563 As a matter of interest, why does a global variable not work? view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/&

How to pass variables across functions in spark structured streaming (PySpark)

2023-03-04 Thread Mich Talebzadeh
tion sendToControl(dfnewtopic, batchId2) so I can print it out. Defining a global did not work.. So it sounds like I am missing something rudimentary here! Thanks view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich

Re: SPIP architecture diagrams

2023-03-04 Thread Mich Talebzadeh
ication. I have tried to make it generic. However, trademarks are acknowledged . I have tried not to use color but I guess pointers are fair. Let me know your thoughts. Regards view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.e

Re: Spike on number of tasks - dynamic allocation

2023-02-27 Thread Mich Talebzadeh
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical c

Re: Spike on number of tasks - dynamic allocation

2023-02-27 Thread Mich Talebzadeh
Hi, What is the spark version and what type of cluster is it, spark on dataproc or other? HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any a

Fwd: 自动回复: Re: [DISCUSS] Show Python code examples first in Spark documentation

2023-02-26 Thread Mich Talebzadeh
Hi, Can someone disable the below login from spark forums please? Sounds like someone left this email and we are receiving a spam type message anytime we respond. thanks view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywi

Re: Unable to handle bignumeric datatype in spark/pyspark

2023-02-25 Thread Mich Talebzadeh
sounds like it is cosmetric. The important point is that if the data stored in GBQ is valid? THT view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any a

Re: SPIP architecture diagrams

2023-02-24 Thread Mich Talebzadeh
considered? Why were they rejected? If no alternatives have been considered, the problem needs more thought.* HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any a

Re: Unable to handle bignumeric datatype in spark/pyspark

2023-02-24 Thread Mich Talebzadeh
Hi Nidhi, can you create a BigQuery table with a bignumeric and numeric column types, add a few lines and try to read into spark. through DF and do df.printSchema() df.show(5,False) HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>

Re: Spark with bigquery : Data type issue

2023-02-22 Thread Mich Talebzadeh
Hi, What version of Spark and how are you are writing to GBQ table? Is the source column in ETL has NUMERIC(38) say coming from Oracle? view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer

SPIP: Adding work load identity to Spark on Kubernetes documents (supersedes Secret Management)

2023-02-20 Thread Mich Talebzadeh
t; } Cloud service account keys do not expire and require manual rotation. Exporting service account keys has the potential to expand the scope of a security breach if it goes undetected. If an exported key is stolen, an attacker can use it to authenticate as that service account until noticed and ma

Re: Graceful shutdown SPARK Structured Streaming

2023-02-20 Thread Mich Talebzadeh
442-one-good-test-is-worth-a-thousand-expert-opinions> view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruc

Re: SPIP: Shutting down spark structured streaming when the streaming process completed current process

2023-02-19 Thread Mich Talebzadeh
Hi Dongjoon., This was an oversight from my side. I confused your involvement with docker build stuff. HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own ris

SPIP: Shutting down spark structured streaming when the streaming process completed current process

2023-02-18 Thread Mich Talebzadeh
/list.html?d...@spark.apache.org Thanks. view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any

Re: Running Spark on Kubernetes (GKE) - failing on spark-submit

2023-02-16 Thread Mich Talebzadeh
author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Wed, 15 Feb 2023 at 21:17, karan alang wrote: > thnks, Mich .. let me check this > > > > On Wed, Feb 15, 2023 at 1:42 AM Mich Talebzadeh > wrote: > >> >

Re: Running Spark on Kubernetes (GKE) - failing on spark-submit

2023-02-15 Thread Mich Talebzadeh
It may help to check this article of mine Spark on Kubernetes, A Practitioner’s Guide <https://www.linkedin.com/pulse/spark-kubernetes-practitioners-guide-mich-talebzadeh-ph-d-/?trackingId=FDQORri0TBeJl02p3D%2B2JA%3D%3D> HTH view my Linkedin profile <https://www.linkedin.co

Re: Running Spark on Kubernetes (GKE) - failing on spark-submit

2023-02-15 Thread Mich Talebzadeh
gs not s3 There is no point putting your python file in the docker image itself! HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility for

Re: How to improve efficiency of this piece of code (returning distinct column values)

2023-02-12 Thread Mich Talebzadeh
Hi Sam, I am curious to know the business use case for this solution if any? HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility f

Re: How to improve efficiency of this piece of code (returning distinct column values)

2023-02-10 Thread Mich Talebzadeh
eateOrReplaceTempView("temp") ## do your distinct columns using windowing functions on temp table with SQL HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it

Re: How to upgrade a spark structure streaming application

2023-02-07 Thread Mich Talebzadeh
on file system and polling > periodically to stop running query. > > Thanks, > > Yoel > > > > -- view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your o

Fwd: Graceful shutdown SPARK Structured Streaming

2023-02-07 Thread Mich Talebzadeh
-- Forwarded message - From: Mich Talebzadeh Date: Thu, 6 May 2021 at 20:07 Subject: Re: Graceful shutdown SPARK Structured Streaming To: ayan guha Cc: Gourav Sengupta , user @spark < user@spark.apache.org> That is a valid question and I am not aware of any new ad

Re: Spark with GPU

2023-02-05 Thread Mich Talebzadeh
if you have several nodes with only one node having GPUs, you still have to wait for the result set to complete. In other words it will be as fast as the lowest denominator .. my postulation HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>

Re: Create table before inserting in SQL

2023-02-02 Thread Mich Talebzadeh
you may be able to do so in Python or SCALA but I don't know the way in pure SQL. if your JDBC database is Hive you can do so easily HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer

Re: Create table before inserting in SQL

2023-02-01 Thread Mich Talebzadeh
? How do you verify if it exists? Can you share the code and the doc link? HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibil

Re: Help needed regarding error with 5 node Spark cluster (shuffle error)- Comcast

2023-01-30 Thread Mich Talebzadeh
Hi, Identify the cause of the shuffle. Also how are you using HDFS here? https://community.cloudera.com/t5/Support-Questions/Spark-Metadata-Fetch-Failed-Exception-Missing-an-output/td-p/203771 HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>

Re: Re: spark+kafka+dynamic resource allocation

2023-01-30 Thread Mich Talebzadeh
Sure, I suggest that you add a note to that Jira and express your interest. HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility f

Re: Re: spark+kafka+dynamic resource allocation

2023-01-29 Thread Mich Talebzadeh
proc/docs/concepts/configuring-clusters/autoscaling#autoscaling_and_spark_streaming>like Google Dataproc does not support Spark Structured Streaming either HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.c

Re: Spark SQL question

2023-01-28 Thread Mich Talebzadeh
ls: [keyword#226, occurence#227L], Partition Cols: []]* `data.group` with quotes is neither the name of the column or its alias *HTH* view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:*

Re: Question regarding Spark 3.X performance

2023-01-27 Thread Mich Talebzadeh
of health. Do you have these stats for both versions? cheers view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss,

Re: Question regarding Spark 3.X performance

2023-01-26 Thread Mich Talebzadeh
3.3,.1 excels in detail. For that you need to look at the Spark GUI matrix. HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility for any

<    1   2   3   4   5   6   7   8   9   10   >