Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Deepak Sharma
+1 . I can contribute to it as well . On Tue, 19 Mar 2024 at 9:19 AM, Code Tutelage wrote: > +1 > > Thanks for proposing > > On Mon, Mar 18, 2024 at 9:25 AM Parsian, Mahmoud > wrote: > >> Good idea. Will be useful >> >> >> >> +1 >> >> >> >> >> >> >> >> *From: *ashok34...@yahoo.com.INVALID >>

Re: Data Contracts

2023-06-19 Thread Deepak Sharma
demonstrates > that one that does not, throws an Exception. > > I've had to slightly modify 3 Spark files to add the data contract > functionality. If you can think of a more elegant solution, I'd be very > grateful. > > Regards, > > Phillip > > > > > On Mo

Re: Data Contracts

2023-06-19 Thread Deepak Sharma
ce made a breaking change and was >>>> not even aware that his London based department existed, never mind >>>> depended on their data. In large organisations, this is pretty common. >>>> >>>> TBH, my proposal doesn't address this particular use

Re: Data Contracts

2023-06-12 Thread Deepak Sharma
Spark can be used with tools like great expectations as well to implement the data contracts . I am not sure though if spark alone can do the data contracts . I was reading a blog on data mesh and how to glue it together with data contracts , that’s where I came across this spark and great

Re: Spark on Kube (virtua) coffee/tea/pop times

2023-02-07 Thread Deepak Sharma
Please count me in . Can we have spark on k8s with spark-connect feature covered? On Wed, 8 Feb 2023 at 10:03, Kirti Ruge wrote: > Greetings everyone, > I would love to be part of this session. > IST > > > On Wed, 8 Feb 2023 at 9:13 AM, Colin Williams < > colin.williams.seat...@gmail.com>

Re: Spark Issue with Istio in Distributed Mode

2022-09-12 Thread Deepak Sharma
oy-v3-api-field-config-core-v3-httpprotocoloptions-idle-timeout > > > On Sat, Sep 3, 2022 at 4:23 AM Deepak Sharma > wrote: > >> Thank for the reply IIan . >> Can we set this in spark conf or does it need to goto istio / envoy conf? >> >> >>

Re: Spark Issue with Istio in Distributed Mode

2022-09-03 Thread Deepak Sharma
at 12:17 AM Deepak Sharma > wrote: > >> Hi All, >> In 1 of our cluster , we enabled Istio where spark is running in >> distributed mode. >> Spark works fine when we run it with Istio in standalone mode. >> In spark distributed mode , we are seeing that every 1 hou

Spark Issue with Istio in Distributed Mode

2022-09-02 Thread Deepak Sharma
Hi All, In 1 of our cluster , we enabled Istio where spark is running in distributed mode. Spark works fine when we run it with Istio in standalone mode. In spark distributed mode , we are seeing that every 1 hour or so the workers are getting disassociated from master and then master is not able

Observability around Flink Pipeline/stateful functions

2021-07-22 Thread Deepak Sharma
@dev@spark.apache.org @user I am looking for an example around the observability framework for Apache Flink pipelines. This could be message tracing across multiple flink pipelines or query on the past state of a message that was processed by any flink pipeline. If anyone has done similar work

Write to same hdfs dir from multiple spark jobs

2020-07-29 Thread Deepak Sharma
Hi Is there any design pattern around writing to the same hdfs directory from multiple spark jobs? -- Thanks Deepak www.bigdatabig.com

GroupBy issue while running K-Means - Dataframe

2020-06-16 Thread Deepak Sharma
Hi All, I have a custom implementation of K-Means where it needs the data to be grouped by a key in a dataframe. Now there is a big data skew for some of the keys , where it exceeds the BufferHolder: Cannot grow BufferHolder by size 17112 because the size after growing exceeds size limitation

unsubscribe

2019-12-07 Thread Deepak Sharma

Read hdfs files in spark streaming

2019-06-09 Thread Deepak Sharma
I am using spark streaming application to read from kafka. The value coming from kafka message is path to hdfs file. I am using spark 2.x , spark.read.stream. What is the best way to read this path in spark streaming and then read the json stored at the hdfs path , may be using spark.read.json ,

Re: welcoming Burak and Holden as committers

2017-01-24 Thread Deepak Sharma
Congratulations Holden & Burak On Wed, Jan 25, 2017 at 8:23 AM, jiangxingbo wrote: > Congratulations Burak & Holden! > > > 在 2017年1月25日,上午2:13,Reynold Xin 写道: > > > > Hi all, > > > > Burak and Holden have recently been elected as Apache Spark

Auto start spark jobs

2016-10-10 Thread Deepak Sharma
Hi All Is there any way to schedule the ever running spark in such a way that it comes up on its own , after the cluster maintenance? -- Thanks Deepak www.bigdatabig.com www.keosha.net

Re: Reading back hdfs files saved as case class

2016-10-07 Thread Deepak Sharma
ue for case classes > that are not very complex. > > On Fri, Oct 7, 2016 at 12:20 PM, Deepak Sharma <deepakmc...@gmail.com> > wrote: > >> Hi >> I am saving RDD[Example] in hdfs from spark program , where Example is >> case class. >> Now when i am trying to re

Reading back hdfs files saved as case class

2016-10-07 Thread Deepak Sharma
Hi I am saving RDD[Example] in hdfs from spark program , where Example is case class. Now when i am trying to read it back , it returns RDD[String] with the content as below: *Example(1,name,value)* The workaround can be to write as a string in hdfs and read it back as string and perform further

Use cases around image/video processing in spark

2016-08-10 Thread Deepak Sharma
Hi If anyone is using or knows about github repo that can help me get started with image and video processing using spark. The images/videos will be stored in s3 and i am planning to use s3 with Spark. In this case , how will spark achieve distributed processing? Any code base or references is

How to map values read from test file to 2 different RDDs

2016-05-23 Thread Deepak Sharma
Hi I am reading a text file with 16 fields. All the place holders for the values of this text file has been defined in say 2 different case classes: Case1 and Case2 How do i map values read from text file , so my function in scala should be able to return 2 different RDDs , with each each RDD of