date:20220505

Re: groupby question

2022-05-05 Thread wilson

don't know what you were trying to express. it's better if you can give the sample dataset and the purpose you want to achieve, then we may give the right solution. Thanks Irene Markelic wrote: I have and rdd that I want to group according to some key, but it just doesn't work. I am a Scala

Re: [EXTERNAL] Parse Execution Plan from PySpark

2022-05-05 Thread Pablo Alcain

Amazing, it looks like parsing the execution plan from plain text can be a good first approach, at least for a proof of concept. I'll let you guys know how it works out! Thanks Walaa for those links, they are super useful. On Tue, May 3, 2022 at 5:39 AM Walaa Eldin Moustafa wrote: > Hi Pablo, >

groupby question

2022-05-05 Thread Irene Markelic

Hi everybody, I have and rdd that I want to group according to some key, but it just doesn't work. I am a Scala beginner. So I have the following RDD: langs: List[String] rdd: RDD[WikipediaArticle]) val meinVal = rdd.flatMap(article=>langs.map(lang=>{if (article.mentionsLanguage(lang){

Re: Something about Spark which has bothered me for a very long time, which I've never understood

2022-05-05 Thread Lalwani, Jayesh

Have you tried taking several thread dumps across executors to see if the executors are consistently waiting for a resource? I suspect it’s S3.. S3’s list operation doesn’t scale with the number of keys in a folder. You aren’t being throttled by S3. S3 is just slow when you have lot of small

Something about Spark which has bothered me for a very long time, which I've never understood

2022-05-05 Thread Denarian Kislata

Greetings, and thanks in advance. For ~8 years i've been a spark user, and i've seen this same problem at more SaaS startups than I can count, and although it's straightforward to fix, I've never understood _why_ it happens. I'm hoping someone can explain the why behind it. Unfortunately I

Kafka Spark Structure Streaming Error

2022-05-05 Thread nayan sharma

Hi, Does anyone have idea how to fix this error? We are coming data from the Kafka topic having 105 partitions using spark structure streaming. Every hour 5-6 batches are getting failed due to this. I couldn’t find any solution anywhere. 22/05/05 10:37:01 INFO impl.PhysicalFsWriter: ORC writer

Re: Disable/Remove datasources in Spark

2022-05-05 Thread wilson

btw, I use drill to query webserver log only, b/c drill has that a storage plugin for httpd server log. but I found spark is also convenient to query webserver log for which I wrote a note: https://notes.4shield.net/how-to-query-webserver-log-with-spark.html Thanks wilson wrote: though

Re: Disable/Remove datasources in Spark

2022-05-05 Thread wilson

though this is off-topic. but Apache Drill can does that. for instance, you can keep only the csv storage plugin in the configuration, but remove all other storage plugins. then users on drill can query csv only. regards Aditya wrote: So, is there a way for me to get a list of "leaf"

Re: Disable/Remove datasources in Spark

2022-05-05 Thread Aditya

My understanding is if I can disable a parquet datasource, the user will get an error when they try spark.read.parquet() To give context my main objective is that I provide a few dataframes to my users, and I don't want them to be able to access any data other than these specific dataframes. So,

Re: Disable/Remove datasources in Spark

2022-05-05 Thread wilson

it's maybe impossible to disable that? user can run spark.read... to read any datasource he can reach. Aditya wrote: 2. But I am not able to figure out how to "disable" all other data sources - To unsubscribe e-mail:

Disable/Remove datasources in Spark

2022-05-05 Thread Aditya

Hi, I am trying to force all users to use only 1 datasource (A custom datasource I plan to write) to read/write data. So, I was looking at the DataSource api in Spark: 1. I was able to figure out how to create my own Datasource (Reference

Re: groupby question

Re: [EXTERNAL] Parse Execution Plan from PySpark

groupby question

Re: Something about Spark which has bothered me for a very long time, which I've never understood

Something about Spark which has bothered me for a very long time, which I've never understood

Kafka Spark Structure Streaming Error

Re: Disable/Remove datasources in Spark

Re: Disable/Remove datasources in Spark

Re: Disable/Remove datasources in Spark

Re: Disable/Remove datasources in Spark

Disable/Remove datasources in Spark

11 matches

Site Navigation

Mail list logo

Footer information