I can't exactly reproduce this. Here is what I tried quickly:
import uuid
import findspark
findspark.init() # noqa
import pyspark
from pyspark.sql import functions as F # noqa: N812
spark = pyspark.sql.SparkSession.builder.getOrCreate()
df = spark.createDataFrame([
[str(uuid.uuid4()) for
> We’d like to deploy Spark Workers/Executors and Master (whatever master is
> easiest to talk about since we really don’t care) in pods as we do with the
> other services we use. Replace Spark Master with k8s if you insist. How do
> the executors get deployed?
When running Spark against
We have a machine Learning Server. It submits various jobs through the
Spark Scala API. The Server is run in a pod deployed from a chart by k8s.
It later uses the Spark API to submit jobs. I guess we find spark-submit to
be a roadblock to our use of Spark and the k8s support is fine but how do
you
k8s as master would be nice but doesn’t solve the problem of running the
full cluster and is an orthogonal issue.
We’d like to deploy Spark Workers/Executors and Master (whatever master is
easiest to talk about since we really don’t care) in pods as we do with the
other services we use. Replace
Sorry, I don’t quite follow – why use the Spark standalone cluster as an
in-between layer when one can just deploy the Spark application directly inside
the Helm chart? I’m curious as to what the use case is, since I’m wondering if
there’s something we can improve with respect to the native
Thanks Matt,
Actually I can’t use spark-submit. We submit the Driver programmatically
through the API. But this is not the issue and using k8s as the master is
also not the issue though you may be right about it being easier, it
doesn’t quite get to the heart.
We want to orchestrate a bunch of
I would recommend looking into Spark’s native support for running on
Kubernetes. One can just start the application against Kubernetes directly
using spark-submit in cluster mode or starting the Spark context with the right
parameters in client mode. See
Hello:
I am using UDF to convert schema to JSON, and based on the JSON schema,
when a schema has key “type” is “number”, I need to convert the input data
to float, such as if an “income” type is number, and the input data is
“100”, the output should be “100.0”. But the problem is if an original
Hey there,
I think it's overcomplicating the partitioning by explicitly specifying the
partitioning when using the hash is the default behaviour of the
partitioner in Spark. You could simply do a partitionBy and it would
implement the hash partitioner by default.
Let me know if I've
Hi All,
I have a simple partition write like below:
df = spark.read.parquet('read-location')
df.write.partitionBy('col1').mode('overwrite').parquet('write-location')
this fails after an hr with "file already exists (in .staging directory)"
error. Not sure what am I doing wrong here..
--
Use a windowing function to get the "latest" version of the records from
your incoming dataset and then update Oracle with the values, presumably
via a JDBC connector.
I hope that helps.
On Mon, 1 Jul 2019 at 14:04, Sachit Murarka wrote:
> Hi Chris,
>
> I have to make sure my DB has updated
Hello everyone,
I wanted to ask what's the state of support of Spark dynamic allocation as
of now, if there's any issue where I could track its advancement and
missing features.
We've just started evaluating possible alternatives for a production
architectural setup for our use case, and dynamic
12 matches
Mail list logo