unsubscribe

2023-09-19 Thread Danilo Sousa
unsubscribe - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

unsubscribe

2023-09-19 Thread Ghousia
unsubscribe

Re: Discriptency sample standard deviation pyspark and Excel

2023-09-19 Thread Mich Talebzadeh
Hi Helen, Assuming you want to calculate stddev_samp, Spark correctly points STDDEV to STDDEV_SAMP. In below replace sales with your table name and AMOUNT_SOLD with the column you want to do the calculation SELECT

Re: Discriptency sample standard deviation pyspark and Excel

2023-09-19 Thread Bjørn Jørgensen
from pyspark.sql import SparkSession from pyspark.sql.functions import stddev_samp, stddev_pop spark = SparkSession.builder.getOrCreate() data = [(52.7,), (45.3,), (60.2,), (53.8,), (49.1,), (44.6,), (58.0,), (56.5,), (47.9,), (50.3,)] df = spark.createDataFrame(data, ["value"])

Create an external table with DataFrameWriterV2

2023-09-19 Thread Christophe Préaud
Hi, I usually create an external Delta table with the command below, using DataFrameWriter API: df.write    .format("delta")    .option("path", "")    .saveAsTable("") Now I would like to use the DataFrameWriterV2 API. I have tried the following command: df.writeTo("")    .using("delta")    

Re: Discriptency sample standard deviation pyspark and Excel

2023-09-19 Thread Sean Owen
Pyspark follows SQL databases here. stddev is stddev_samp, and sample standard deviation is the calculation with the Bessel correction, n-1 in the denominator. stddev_pop is simply standard deviation, with n in the denominator. On Tue, Sep 19, 2023 at 7:13 AM Helene Bøe wrote: > Hi! > > > > I

Spark streaming sourceArchiveDir does not move file to archive directory

2023-09-19 Thread Yunus Emre G?rses
Hello everyone, I'm using scala and spark with the version 3.4.1 in Windows 10. While streaming using Spark, I give the `cleanSource` option as "archive" and the `sourceArchiveDir` option as "archived" as in the code below. ``` spark.readStream .option("cleanSource", "archive")

Discriptency sample standard deviation pyspark and Excel

2023-09-19 Thread Helene Bøe
Hi! I am applying the stddev function (so actually stddev_samp), however when comparing with the sample standard deviation in Excel the resuls do not match. I cannot find in your documentation any more specifics on how the sample standard deviation is calculated, so I cannot compare the

Re: Spark stand-alone mode

2023-09-19 Thread Patrick Tucci
Multiple applications can run at once, but you need to either configure Spark or your applications to allow that. In stand-alone mode, each application attempts to take all resources available by default. This section of the documentation has more details:

Urgent: Seeking Guidance on Kafka Slow Consumer and Data Skew Problem

2023-09-19 Thread Karthick
Subject: Seeking Guidance on Kafka Slow Consumer and Data Skew Problem Dear Spark Community, I recently reached out to the Apache Flink community for assistance with a critical issue we are facing in our IoT platform, which relies on Apache Kafka and real-time data processing. We received some