from:"frakass"

virtual_mailbox_maps usage

2022-08-30 Thread frakass

I have a virtual_mailbox_domain: a.com and I have a virtual_alias_domain: b.com I can setup this entry in virtual_alias_maps for a domain alias: x...@b.com x...@a.com but what's the usage of virtual_mailbox_maps? Thank you.

where to setup virtual_mailbox_maps

2022-08-30 Thread frakass

Hello, I have a domain in virtual_mailbox_domains: aaa.com I have also the virtual_alias_domains which include: bbb.com I know how to forward x...@bbb.com to y...@aaa.com by setting up the file "virtual_alias_maps": x...@bbb.com y...@aaa.com (and run postmap after the changes.) But, how

Re: sync or async producer

2022-02-15 Thread frakass

host:9092", client_id: "ruby-client", resolve_seed_brokers: true) producer = kafka.producer(required_acks: :all,max_buffer_size: 50_000) 1.times do message = rand.to_s producer.produce(message, topic: "mytest") end producer.deliver_messages Thanks On 2022/2/16 10:18, Luke Ch

sync or async producer

2022-02-15 Thread frakass

for a producer, is there a principle that when to use sync publishing, and when to use async publishing? for the simple format messages, i have tested both, their performance are almost the same. Thank you. frakass

Re: how to classify column

2022-02-11 Thread frakass

that's good. thanks On 2022/2/12 12:11, Raghavendra Ganesh wrote: .withColumn("newColumn",expr(s"case when score>3 then 'good' else 'bad' end")) - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

how to classify column

2022-02-11 Thread frakass

Hello I have a column whose value (Int type as score) is from 0 to 5. I want to query that, when the score > 3, classified as "good". else classified as "bad". How do I implement that? A UDF like something as this? scala> implicit class Foo(i:Int) { | def classAs(f:Int=>String) = f(i)

Re: data size exceeds the total ram

2022-02-11 Thread frakass

nown as 3 executors?), are there 3 partitions for each job? 2. can I expand the partition by hand to increase the performance? Thanks On 2022/2/11 6:22, frakass wrote: On 2022/2/11 6:16, Gourav Sengupta wrote: What is the source data (is it JSON, CSV, Parquet, etc)? Where are you reading it from (JDB

Re: data size exceeds the total ram

2022-02-11 Thread frakass

On 2022/2/11 6:16, Gourav Sengupta wrote: What is the source data (is it JSON, CSV, Parquet, etc)? Where are you reading it from (JDBC, file, etc)? What is the compression format (GZ, BZIP, etc)? What is the SPARK version that you are using? it's a well built csv file (no compressed)

data size exceeds the total ram

2022-02-11 Thread frakass

Hello I have three nodes with total memory 128G x 3 = 384GB But the input data is about 1TB. How can spark handle this case? Thanks. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Using Avro file format with SparkSQL

2022-02-09 Thread frakass

Have you added the dependency in the build.sbt? Can you 'sbt package' the source successfully? regards frakass On 2022/2/10 11:25, Karanika, Anna wrote: For context, I am invoking spark-submit and adding arguments --packages org.apache.spark:spark-avro_2.12:3.2.0

Re: question on the different way of RDD to dataframe

2022-02-08 Thread frakass

I think it's better as: df1.map { case(w,x,y,z) => columns(w,x,y,z) } Thanks On 2022/2/9 12:46, Mich Talebzadeh wrote: scala> val df2 = df1.map(p => columns(p(0).toString,p(1).toString, p(2).toString,p(3).toString.toDouble)) // map those columns

Re: flatMap for dataframe

2022-02-08 Thread frakass

Is this the scala syntax? Yes in scala I know how to do it by converting the df to a dataset. how for pyspark? Thanks On 2022/2/9 10:24, oliver dd wrote: df.flatMap(row => row.getAs[String]("value").split(" ")) - To

Re: question on the different way of RDD to dataframe

2022-02-08 Thread frakass

I know that using case class I can control the data type strictly. scala> val rdd = sc.parallelize(List(("apple",1),("orange",2))) rdd: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[0] at parallelize at :23 scala> rdd.toDF.printSchema root |-- _1: string (nullable = true)

flatMap for dataframe

2022-02-08 Thread frakass

Hello for the RDD I can apply flatMap method: >>> sc.parallelize(["a few words","ba na ba na"]).flatMap(lambda x: x.split(" ")).collect() ['a', 'few', 'words', 'ba', 'na', 'ba', 'na'] But for a dataframe table how can I flatMap that as above? >>> df.show() ++ |

Re: unsubscribe

2022-01-14 Thread frakass

please send an empty message to: user-unsubscr...@spark.apache.org to unsubscribe yourself from the list. Thanks On 2022/1/15 7:04, ALOK KUMAR SINGH wrote: unsubscribe - To unsubscribe e-mail:

Re: groupMapReduce

2022-01-14 Thread frakass

OK thanks. I will check that. On 2022/1/14 7:09, David Diebold wrote: Hello, In RDD api, you must be looking for reduceByKey. Cheers Le ven. 14 janv. 2022 à 11:56, frakass <mailto:capitnfrak...@free.fr>> a écrit : Is there a RDD API which is similar to Scala's group

groupMapReduce

2022-01-14 Thread frakass

Is there a RDD API which is similar to Scala's groupMapReduce? https://blog.genuine.com/2019/11/scalas-groupmap-and-groupmapreduce/ Thank you. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: about memory size for loading file

2022-01-13 Thread frakass

for this case i have 3 partitions, each process 3.333 GB data, am i right? On 2022/1/14 2:20, Sonal Goyal wrote: No it should not. The file would be partitioned and read across each node. On Fri, 14 Jan 2022 at 11:48 AM, frakass <mailto:capitnfrak...@free.fr>> wrote: H

about memory size for loading file

2022-01-13 Thread frakass

Hello list Given the case I have a file whose size is 10GB. The ram of total cluster is 24GB, three nodes. So the local node has only 8GB. If I load this file into Spark as a RDD via sc.textFile interface, will this operation run into "out of memory" issue? Thank you.

virtual_mailbox_maps usage

where to setup virtual_mailbox_maps

Re: sync or async producer

sync or async producer

Re: how to classify column

how to classify column

Re: data size exceeds the total ram

Re: data size exceeds the total ram

data size exceeds the total ram

Re: Using Avro file format with SparkSQL

Re: question on the different way of RDD to dataframe

Re: flatMap for dataframe

Re: question on the different way of RDD to dataframe

flatMap for dataframe

Re: unsubscribe

Re: groupMapReduce

groupMapReduce

Re: about memory size for loading file

about memory size for loading file

19 matches

Site Navigation

Mail list logo

Footer information