I have a virtual_mailbox_domain:
a.com
and I have a virtual_alias_domain:
b.com
I can setup this entry in virtual_alias_maps for a domain alias:
x...@b.com x...@a.com
but what's the usage of virtual_mailbox_maps?
Thank you.
Hello,
I have a domain in virtual_mailbox_domains:
aaa.com
I have also the virtual_alias_domains which include:
bbb.com
I know how to forward x...@bbb.com to y...@aaa.com by setting up the file
"virtual_alias_maps":
x...@bbb.com y...@aaa.com
(and run postmap after the changes.)
But, how
host:9092", client_id: "ruby-client",
resolve_seed_brokers: true)
producer = kafka.producer(required_acks: :all,max_buffer_size: 50_000)
1.times do
message = rand.to_s
producer.produce(message, topic: "mytest")
end
producer.deliver_messages
Thanks
On 2022/2/16 10:18, Luke Ch
for a producer, is there a principle that when to use sync publishing,
and when to use async publishing?
for the simple format messages, i have tested both, their performance
are almost the same.
Thank you.
frakass
that's good. thanks
On 2022/2/12 12:11, Raghavendra Ganesh wrote:
.withColumn("newColumn",expr(s"case when score>3 then 'good' else 'bad'
end"))
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Hello
I have a column whose value (Int type as score) is from 0 to 5.
I want to query that, when the score > 3, classified as "good". else
classified as "bad".
How do I implement that? A UDF like something as this?
scala> implicit class Foo(i:Int) {
| def classAs(f:Int=>String) = f(i)
nown as 3 executors?), are there 3
partitions for each job?
2. can I expand the partition by hand to increase the performance?
Thanks
On 2022/2/11 6:22, frakass wrote:
On 2022/2/11 6:16, Gourav Sengupta wrote:
What is the source data (is it JSON, CSV, Parquet, etc)? Where are you
reading it from (JDB
On 2022/2/11 6:16, Gourav Sengupta wrote:
What is the source data (is it JSON, CSV, Parquet, etc)? Where are you
reading it from (JDBC, file, etc)? What is the compression format (GZ,
BZIP, etc)? What is the SPARK version that you are using?
it's a well built csv file (no compressed)
Hello
I have three nodes with total memory 128G x 3 = 384GB
But the input data is about 1TB.
How can spark handle this case?
Thanks.
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Have you added the dependency in the build.sbt?
Can you 'sbt package' the source successfully?
regards
frakass
On 2022/2/10 11:25, Karanika, Anna wrote:
For context, I am invoking spark-submit and adding arguments --packages
org.apache.spark:spark-avro_2.12:3.2.0
I think it's better as:
df1.map { case(w,x,y,z) => columns(w,x,y,z) }
Thanks
On 2022/2/9 12:46, Mich Talebzadeh wrote:
scala> val df2 = df1.map(p => columns(p(0).toString,p(1).toString,
p(2).toString,p(3).toString.toDouble)) // map those columns
Is this the scala syntax?
Yes in scala I know how to do it by converting the df to a dataset.
how for pyspark?
Thanks
On 2022/2/9 10:24, oliver dd wrote:
df.flatMap(row => row.getAs[String]("value").split(" "))
-
To
I know that using case class I can control the data type strictly.
scala> val rdd = sc.parallelize(List(("apple",1),("orange",2)))
rdd: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[0]
at parallelize at :23
scala> rdd.toDF.printSchema
root
|-- _1: string (nullable = true)
Hello
for the RDD I can apply flatMap method:
>>> sc.parallelize(["a few words","ba na ba na"]).flatMap(lambda x:
x.split(" ")).collect()
['a', 'few', 'words', 'ba', 'na', 'ba', 'na']
But for a dataframe table how can I flatMap that as above?
>>> df.show()
++
|
please send an empty message to: user-unsubscr...@spark.apache.org to
unsubscribe yourself from the list.
Thanks
On 2022/1/15 7:04, ALOK KUMAR SINGH wrote:
unsubscribe
-
To unsubscribe e-mail:
OK thanks. I will check that.
On 2022/1/14 7:09, David Diebold wrote:
Hello,
In RDD api, you must be looking for reduceByKey.
Cheers
Le ven. 14 janv. 2022 à 11:56, frakass <mailto:capitnfrak...@free.fr>> a écrit :
Is there a RDD API which is similar to Scala's group
Is there a RDD API which is similar to Scala's groupMapReduce?
https://blog.genuine.com/2019/11/scalas-groupmap-and-groupmapreduce/
Thank you.
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
for this case i have 3 partitions, each process 3.333 GB data, am i right?
On 2022/1/14 2:20, Sonal Goyal wrote:
No it should not. The file would be partitioned and read across each node.
On Fri, 14 Jan 2022 at 11:48 AM, frakass <mailto:capitnfrak...@free.fr>> wrote:
H
Hello list
Given the case I have a file whose size is 10GB. The ram of total
cluster is 24GB, three nodes. So the local node has only 8GB.
If I load this file into Spark as a RDD via sc.textFile interface, will
this operation run into "out of memory" issue?
Thank you.
19 matches
Mail list logo