RE: Regarding spark-3.2.0 decommission features.

2022-01-26 Thread Rao, Abhishek (Nokia - IN/Bangalore)
Hi Dongjoon Hyun, Any inputs on the below issue would be helpful. Please let us know if we're missing anything? Thanks and Regards, Abhishek From: Patidar, Mohanlal (Nokia - IN/Bangalore) Sent: Thursday, January 20, 2022 11:58 AM To: user@spark.apache.org Subject: Suspected SPAM - RE:

Re: question for definition of column types

2022-01-26 Thread Sean Owen
You can cast the cols as well. But are the columns strings to begin with? they could also actually be doubles. On Wed, Jan 26, 2022 at 8:49 PM wrote: > when creating dataframe from a list, how can I specify the col type? > > such as: > > >>> df = > >>> >

Re: question for definition of column types

2022-01-26 Thread Peyman Mohajerian
from pyspark.sql.types import * list =[("buck trends", "ceo", 20.00, 0.25, "100")] schema = StructType([ StructField("name", StringType(), True), StructField("title", StringType(), True), StructField("salary", DoubleType(), True),

question for definition of column types

2022-01-26 Thread capitnfrakass
when creating dataframe from a list, how can I specify the col type? such as: df = spark.createDataFrame(list,["name","title","salary","rate","insurance"]) df.show() +---+-+--++-+ | name|title|salary|rate|insurance|

Re: Migration to Spark 3.2

2022-01-26 Thread Stephen Coy
Hi Aurélien! Please run mvn dependency:tree and check it for Jackson dependencies. Feel free to respond with the output if you have any questions about it. Cheers, Steve C > On 22 Jan 2022, at 10:49 am, Aurélien Mazoyer wrote: > > Hello, > > I migrated my code to Spark 3.2 and I am

unsubscribe

2022-01-26 Thread Lucas Schroeder Rossi
unsubscribe - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: [Spark UDF]: Where does UDF stores temporary Arrays/Sets

2022-01-26 Thread Sean Owen
Really depends on what your UDF is doing. You could read 2GB of XML into much more than that as a DOM representation in memory. Remember 15GB of executor memory is shared across tasks. You need to get a handle on what memory your code is using to begin with to start to reason about whether that's

Re: [Spark UDF]: Where does UDF stores temporary Arrays/Sets

2022-01-26 Thread Abhimanyu Kumar Singh
Thanks for your quick response. For some reasons I can't use spark-xml (schema related issue). I've tried reducing number of tasks per executor by increasing the number of executors, but it still throws same error. I can't understand why does even 15gb of executor memory is not sufficient to

Re: [Spark UDF]: Where does UDF stores temporary Arrays/Sets

2022-01-26 Thread Sean Owen
Executor memory used shows data that is cached, not the VM usage. You're running out of memory somewhere, likely in your UDF, which probably parses massive XML docs as a DOM first or something. Use more memory, fewer tasks per executor, or consider using spark-xml if you are really just parsing

[Spark UDF]: Where does UDF stores temporary Arrays/Sets

2022-01-26 Thread Abhimanyu Kumar Singh
I'm doing some complex operations inside spark UDF (parsing huge XML). Dataframe: | value | | Content of XML File 1 | | Content of XML File 2 | | Content of XML File N | val df = Dataframe.select(UDF_to_parse_xml(value)) UDF looks something like: val XMLelements : Array[MyClass1] =