Re: pyspark dataframe join with two different data type

2024-05-17 Thread Karthick Nk
s below, but here I am doing explode and doing distinct again, But I need to perform the action without doing this since this will impact performance again for the huge data. Thanks, solutions On Thu, May 16, 2024 at 8:33 AM Karthick Nk wrote: > Thanks Mich, > > I ha

Re: pyspark dataframe join with two different data type

2024-05-15 Thread Karthick Nk
ch-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* The information provided is correct to the best of my > knowledge but of course cannot be guaranteed . It is essential to note > that, as with any advice, quote "one te

Re: pyspark dataframe join with two different data type

2024-05-14 Thread Karthick Nk
and "b" both exist in the > array. So Spark is correctly performing the join. It looks like you need to > find another way to model this data to get what you want to achieve. > > Are the values of "a" and "b" related to each other in any way? > > - Da

Re: pyspark dataframe join with two different data type

2024-05-10 Thread Karthick Nk
t; <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* The information provided is correct to the best of my > knowledge but of course cannot be guaranteed . It is essential to note > that, as wit

Re: ********Spark streaming issue to Elastic data**********

2024-05-06 Thread Karthick Nk
is worth one-thousand > expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von > Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". > > > On Thu, 2 May 2024 at 21:25, Karthick Nk wrote: > >> Hi All, >> >> Requirements: &

********Spark streaming issue to Elastic data**********

2024-05-02 Thread Karthick Nk
Hi All, Requirements: I am working on the data flow, which will use the view definition(view definition already defined in schema), there are multiple tables used in the view definition. Here we want to stream the view data into elastic index based on if any of the table(used in the view

Data ingestion into elastic failing using pyspark

2024-03-11 Thread Karthick Nk
Hi @all, I am using pyspark program to write the data into elastic index by using upsert operation (sample code snippet below). def writeDataToES(final_df): write_options = { "es.nodes": elastic_host, "es.net.ssl": "false", "es.nodes.wan.only": "true",

pyspark dataframe join with two different data type

2024-02-29 Thread Karthick Nk
Hi All, I have two dataframe with below structure, i have to join these two dataframe - the scenario is one column is string in one dataframe and in other df join column is array of string, so we have to inner join two df and get the data if string value is present in any of the array of string

Issue in Creating Temp_view in databricks and using spark.sql().

2024-01-30 Thread Karthick Nk
Hi Team, I am using structered streaming in pyspark in azure Databricks, in that I am creating temp_view from dataframe (df.createOrReplaceTempView('temp_view')) for performing spark sql query transformation. In that I am facing the issue that temp_view not found, so that as a workaround i have

Re: Updating delta file column data

2023-10-09 Thread Karthick Nk
to perform the required action in an optimistic way? Note: Please feel free to ask, if you need further information. Thanks & regards, Karthick On Mon, Oct 2, 2023 at 10:53 PM Karthick Nk wrote: > Hi community members, > > In databricks adls2 delta tables, I need to perform the below

Updating delta file column data

2023-10-02 Thread Karthick Nk
Hi community members, In databricks adls2 delta tables, I need to perform the below operation, could you help me with your thoughts I have the delta tables with one colum with data type string , which contains the json data in string data type, I need to do the following 1. I have to update one

Re: Urgent: Seeking Guidance on Kafka Slow Consumer and Data Skew Problem

2023-09-22 Thread Karthick
Hi All, It will be helpful if we gave any pointers to the problem addressed. Thanks Karthick. On Wed, Sep 20, 2023 at 3:03 PM Gowtham S wrote: > Hi Spark Community, > > Thank you for bringing up this issue. We've also encountered the same > challenge and are actively workin

Urgent: Seeking Guidance on Kafka Slow Consumer and Data Skew Problem

2023-09-19 Thread Karthick
time and consideration. Thanks & regards, Karthick.

Re: Error while merge in delta table

2023-05-12 Thread Karthick Nk
the tables in a concurrent manner, are this is the issue(so we have any constraint for it) For this kind of run time how we can usually identify the root cause of it? On Thu, May 11, 2023 at 9:37 PM Farhan Misarwala wrote: > Hi Karthick, > > I think I have seen this before and this

Error while merge in delta table

2023-05-10 Thread Karthick Nk
Hi, I am trying to merge daaframe with delta table in databricks, but i am getting error, i have attached the code nippet and error message for reference below, code: [image: image.png] error: [image: image.png] Thanks

***pyspark.sql.functions.monotonically_increasing_id()***

2023-04-28 Thread Karthick Nk
Hi @all, I am using monotonically_increasing_id(), in the pyspark function, for removing one field from json field in one column from the delta table, please refer the below code df = spark.sql(f"SELECT * from {database}.{table}") df1 = spark.read.json(df.rdd.map(lambda x: x.data), multiLine =

Re: Converting None/Null into json in pyspark

2022-10-04 Thread Karthick Nk
Yeachan Park wrote: > Hi, > > There's a config option for this. Try setting this to false in your spark > conf. > > spark.sql.jsonGenerator.ignoreNullFields > > On Tuesday, October 4, 2022, Karthick Nk wrote: > >> Hi all, >> >> I need to convert pyspark

Converting None/Null into json in pyspark

2022-10-03 Thread Karthick Nk
Hi all, I need to convert pyspark dataframe into json . While converting , if all rows values are null/None for that particular column that column is getting removed from data. Could you suggest a way to do this. I need to convert dataframe into json with columns. Thanks