from:"ashok34...@yahoo.com.INVALID"

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread ashok34...@yahoo.com.INVALID

Good idea. Will be useful +1 On Monday, 18 March 2024 at 11:00:40 GMT, Mich Talebzadeh wrote: Some of you may be aware that Databricks community Home | Databricks have just launched a knowledge sharing hub. I thought it would be a good idea for the Apache Spark user group to have the

Re: Spark Structured Streaming and Flask REST API for Real-Time Data Ingestion and Analytics.

2024-01-09 Thread ashok34...@yahoo.com.INVALID

Hey Mich, Thanks for this introduction on your forthcoming proposal "Spark Structured Streaming and Flask REST API for Real-Time Data Ingestion and Analytics". I recently came across an article by Databricks with title Scalable Spark Structured Streaming for REST API Destinations. Their use

Re: Clarification with Spark Structured Streaming

2023-10-09 Thread ashok34...@yahoo.com.INVALID

il's technical content is explicitly disclaimed.The author will in no case be liable for any monetary damages arising from suchloss, damage or destruction. On Sun, 8 Oct 2023 at 19:50, ashok34...@yahoo.com.INVALID wrote: Hello team 1) In Spark Structured Streaming does commit mean streaming data

Clarification with Spark Structured Streaming

2023-10-08 Thread ashok34...@yahoo.com.INVALID

Hello team 1) In Spark Structured Streaming does commit mean streaming data has been delivered to the sink like Snowflake? 2) if sinks like Snowflake cannot absorb or digest streaming data in a timely manner, will there be an impact on spark streaming itself? Thanks AK

Need to split incoming data into PM on time column and find the top 5 by volume of data

2023-09-21 Thread ashok34...@yahoo.com.INVALID

Hello gurus, I have a Hive table created as below (there are more columns) CREATE TABLE hive.sample_data ( incoming_ip STRING, time_in TIMESTAMP, volume INT ); Data is stored in that table In PySpark, I want to select the top 5 incoming IP addresses with the highest total volume of data

Re: Filter out 20% of rows

2023-09-16 Thread ashok34...@yahoo.com.INVALID

:44:14| | 84.183.253.20| 7.707176860385722|2021-08-26 23:24:31| |218.163.165.232| 9.458673015973213|2021-02-22 12:13:15| | 62.57.20.153|1.5764916247359229|2021-11-06 12:41:59| | 98.171.202.249| 3.546118349483626|2022-07-05 10:55:26| |180.140.248.193|0.9512956363005021|2021-06-27 18:16:58| | 13

Re: Seeking Professional Advice on Career and Personal Growth in the Apache Spark Community

2023-09-06 Thread ashok34...@yahoo.com.INVALID

Hello Mich, Thanking you for providing these useful feedbacks and responses. We appreciate your contribution to this community forum. I for myself find your posts insightful. +1 for me Best, AK On Wednesday, 6 September 2023 at 18:34:27 BST, Mich Talebzadeh wrote: Hi Varun, In answer

Re: Shuffle with Window().partitionBy()

2023-05-23 Thread ashok34...@yahoo.com.INVALID

, 2023, 18:48 ashok34...@yahoo.com.INVALID wrote: Hello, In Spark windowing does call with Window().partitionBy() can cause shuffle to take place? If so what is the performance impact if any if the data result set is large. Thanks

Shuffle with Window().partitionBy()

2023-05-12 Thread ashok34...@yahoo.com.INVALID

Hello, In Spark windowing does call with Window().partitionBy() can cause shuffle to take place? If so what is the performance impact if any if the data result set is large. Thanks

Potability of dockers built on different cloud platforms

2023-04-05 Thread ashok34...@yahoo.com.INVALID

Hello team Is it possible to use Spark docker built on GCP on AWS without rebuilding from new on AWS? Will that work please. AK

Re: Online classes for spark topics

2023-03-08 Thread ashok34...@yahoo.com.INVALID

disclaimed.The author will in no case be liable for any monetary damages arising from suchloss, damage or destruction. On Tue, 7 Mar 2023 at 19:17, ashok34...@yahoo.com.INVALID wrote: Hello gurus, Does Spark arranges online webinars for special topics like Spark on K8s, data science and Spark

Online classes for spark topics

2023-03-07 Thread ashok34...@yahoo.com.INVALID

Hello gurus, Does Spark arranges online webinars for special topics like Spark on K8s, data science and Spark Structured Streaming? I would be most grateful if experts can share their experience with learners with intermediate knowledge like myself. Hopefully we will find the practical

Re: spark+kafka+dynamic resource allocation

2023-01-28 Thread ashok34...@yahoo.com.INVALID

Hi, Worth checking this link https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation On Saturday, 28 January 2023 at 06:18:28 GMT, Lingzhe Sun wrote: #yiv9684413148 body {line-height:1.5;}#yiv9684413148 ol, #yiv9684413148 ul

Re: Issue while creating spark app

2022-02-28 Thread ashok34...@yahoo.com.INVALID

Thanks for all these useful info Hi all What is the current trend. Is it Spark on Scala with intellij or Spark on python with pycharm. I am curious because I have moderate experience with Spark on both Scala and python and want to focus on Scala OR python going forward with the intention of

Re: Deploying Spark on Google Kubernetes (GKE) autopilot, preliminary findings

2022-02-14 Thread ashok34...@yahoo.com.INVALID

Thanks Mich. Very insightful. AKOn Monday, 14 February 2022, 11:18:19 GMT, Mich Talebzadeh wrote: Good question. However, we ought to look at what options we have so to speak. Let us consider Spark on Dataproc, Spark on Kubernetes and Spark on Dataflow Spark on DataProc is proven

What are the most common operators for shuffle in Spark

2022-01-23 Thread ashok34...@yahoo.com.INVALID

Hello, I know some operators in Spark are expensive because of shuffle. This document describes shuffle https://www.educba.com/spark-shuffle/ and saysMore shufflings in numbers are not always bad. Memory constraints and other impossibilities can be overcome by shuffling. In RDD, the below are a

Spark with parallel processing and event driven architecture

2022-01-14 Thread ashok34...@yahoo.com.INVALID

Hi gurus, I am trying to understand the role of Spark in an event driven architecture. I know Spark deals with massive parallel processing. However, does Spark follow event driven architecture like Kafka as well? Say handling producers, filtering and pushing the events to consumers like

Re: How to change a DataFrame column from nullable to not nullable in PySpark

2021-10-15 Thread ashok34...@yahoo.com.INVALID

arise from relying on this email's technical content is explicitly disclaimed.The author will in no case be liable for any monetary damages arising from suchloss, damage or destruction. On Thu, 14 Oct 2021 at 12:50, ashok34...@yahoo.com.INVALID wrote: Gurus, I have an RDD in PySpark th

How to change a DataFrame column from nullable to not nullable in PySpark

2021-10-14 Thread ashok34...@yahoo.com.INVALID

Gurus, I have an RDD in PySpark that I can convert to DF through df = rdd.toDF() However, when I do df.printSchema() I see the columns as nullable. = true by default root |-- COL-1: long (nullable = true) |-- COl-2: double (nullable = true) |-- COl-3: string (nullable = true) What would be the

Well balanced Python code with Pandas compared to PySpark

2021-07-29 Thread ashok34...@yahoo.com.INVALID

Hello team Someone asked me regarding well developed Python code with Panda dataframe and comparing that to PySpark. Under what situations one choose PySpark instead of Python and Pandas. Appreciate AK

Re: Recovery when two spark nodes out of 6 fail

2021-06-25 Thread ashok34...@yahoo.com.INVALID

to be idempotent; ie; rerunning them shouldn’t change the outcome. Streaming jobs have benchmarking, and they will start from the last microbatch. This means that they might have to repeat the last microbatch. From: "ashok34...@yahoo.com.INVALID" Date: Friday, June 25, 2021 at 10:38 AM

Recovery when two spark nodes out of 6 fail

2021-06-25 Thread ashok34...@yahoo.com.INVALID

Greetings, This is a scenario that we need to come up with a comprehensive answers to fulfil please. If we have 6 spark VMs each running two executors via spark-submit. - we have two VMs failures at H/W level, rack failure - we lose 4 executors of spark out of 12 - Happening half

Re: Spark Streaming non functional requirements

2021-04-27 Thread ashok34...@yahoo.com.INVALID

, ashok34...@yahoo.com.INVALID wrote: Hello, When we design a typical spark streaming process, the focus is to get functional requirements. However, I have been asked to provide non-functional requirements as well. Likely things I can consider are Fault tolerance and Reliability (component

Spark Streaming non functional requirements

2021-04-26 Thread ashok34...@yahoo.com.INVALID

Hello, When we design a typical spark streaming process, the focus is to get functional requirements. However, I have been asked to provide non-functional requirements as well. Likely things I can consider are Fault tolerance and Reliability (component failures). Are there a standard list of

Python level of knowledge for Spark and PySpark

2021-04-14 Thread ashok34...@yahoo.com.INVALID

Hi gurus, I have knowledge of Java, Scala and good enough knowledge of Spark, Spark SQL and Spark Functional programing with Scala. I have started using Python with Spark PySpark. Wondering, in order to be proficient in PySpark, how much good knowledge of Python programing is needed? I know the

repartition in Spark

2020-11-09 Thread ashok34...@yahoo.com.INVALID

Hi, Just need some advise. - When we have multiple spark nodes running code, under what conditions a repartition make sense? - Can we repartition and cache the result --> df = spark.sql("select from ...").repartition(4).cache - If we choose a repartition (4), will that repartition

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

Re: Spark Structured Streaming and Flask REST API for Real-Time Data Ingestion and Analytics.

Re: Clarification with Spark Structured Streaming

Clarification with Spark Structured Streaming

Need to split incoming data into PM on time column and find the top 5 by volume of data

Re: Filter out 20% of rows

Re: Seeking Professional Advice on Career and Personal Growth in the Apache Spark Community

Re: Shuffle with Window().partitionBy()

Shuffle with Window().partitionBy()

Potability of dockers built on different cloud platforms

Re: Online classes for spark topics

Online classes for spark topics

Re: spark+kafka+dynamic resource allocation

Re: Issue while creating spark app

Re: Deploying Spark on Google Kubernetes (GKE) autopilot, preliminary findings

What are the most common operators for shuffle in Spark

Spark with parallel processing and event driven architecture

Re: How to change a DataFrame column from nullable to not nullable in PySpark

How to change a DataFrame column from nullable to not nullable in PySpark

Well balanced Python code with Pandas compared to PySpark

Re: Recovery when two spark nodes out of 6 fail

Recovery when two spark nodes out of 6 fail

Re: Spark Streaming non functional requirements

Spark Streaming non functional requirements

Python level of knowledge for Spark and PySpark

repartition in Spark

26 matches

Site Navigation

Mail list logo

Footer information