Hello!
I have a parquet file that has 60MB representing 10millions records.
When I read this file using Spark 2.3.0 and with the
configuration spark.sql.files.maxPartitionBytes=1024*1024*2 (=2MB) I got 29
partitions as expected.
Code:
sqlContext.setConf("spark.sql.files.maxPartitionBytes",
Hi Jürgen,
Have you found any solution or workaround for multiple sinks from single
source as we cannot process multiple sinks at a time ?
As i also has a scenario in ETL where we are using clone component having
multiple sinks with single input stream dataframe.
Can you keep posting once you
Hi guys,
What all can be inferred by closely watching an event time-line in Spark
UI? I generally monitor the tasks taking more time and also how much in
parallel they're spinning.
What else?
Eg Event Time-line from Spark UI:
Thanks,
Aakash.
Hi,
I'm trying to replace values in a nested column in a JSON-based dataframe
using withColumn().
This syntax works for select, filter, etc, giving only the nested "country"
column:
df.select('body.payload.country')
but if I do this, it will create a new column with the name
Hi all,
I have IoT time series data in Kafka and reading it in static dataframe as:
df = spark.read\
.format("kafka")\
.option("zookeeper.connect", "localhost:2181")\
.option("kafka.bootstrap.servers", "localhost:9092")\
.option("subscribe", "test_topic")\
.option("failOnDataLoss", "false")\