Dear Team,
I hope this email finds you well. My name is Nikhil Raj, and I am currently
working with Apache Spark for one of my projects , where through the help
of a parquet file we are creating an external table in Spark.
I am reaching out to seek assistance regarding user authentication
Hi Spark Community,
I have a question regarding the support for User-Defined Functions (UDFs)
in Spark Connect, specifically when using Kubernetes as the Cluster Manager.
According to the Spark documentation, UDFs are supported by default for the
shell and in standalone applications
nt M-CS) ;
user@spark.apache.org
Subject: Re: [spark-graphframes]: Generating incorrect edges
Hi Steve,
Thanks for your statement. I tend to use uuid myself to avoid collisions. This
built-in function generates random IDs that are highly likely to be unique
across systems. My concerns are
Hi everyone,
We’re about to upgrade our Spark clusters from Java 11 and Spark 3.2.1 to Spark
3.5.1.
I know that 3.5.1 is supposed to be fine on Java 17, but will it run OK on Java
21?
Thanks,
Steve C
This email contains confidential information of and is the copyright of
Infomedia
ided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
Braun <https://en.wikipedia.or
Hi Folks,
I wanted to check why spark doesn't create staging dir while doing an
insertInto on partitioned tables. I'm running below example code –
```
spark.sql("set hive.exec.dynamic.partition.mode=nonstrict")
val rdd = sc.parallelize(Seq((1, 5, 1), (2, 1, 2), (4, 4, 3
Hi Team,
We are trying to use *spark structured streaming *for our use case.
We will be joining 2 streaming sources(from kafka topic) with watermarks.
As time progresses, the records that are prior to the watermark timestamp
are removed from the state. For our use case, we want to *store
Hi Kartrick,
Unfortunately Materialised views are not available in Spark as yet. I
raised Jira [SPARK-48117] Spark Materialized Views: Improve Query
Performance and Data Management - ASF JIRA (apache.org)
<https://issues.apache.org/jira/browse/SPARK-48117> as a feature request.
Let me
the view data into elastic index by using cdc?
Thanks in advance.
On Fri, May 3, 2024 at 3:39 PM Mich Talebzadeh
wrote:
> My recommendation! is using materialized views (MVs) created in Hive with
> Spark Structured Streaming and Change Data Capture (CDC) is a good
> combination for ef
Hi,
I have raised a ticket SPARK-48117
<https://issues.apache.org/jira/browse/SPARK-48117> for enhancing Spark
capabilities with Materialised Views (MV). Currently both Hive and
Databricks support this. I have added these potential benefits to the
ticket
-* Improved Query Perfo
Sadly Apache Spark sounds like it has nothing to do within materialised
views. I was hoping it could read it!
>>> *spark.sql("SELECT * FROM test.mv <http://test.mv>").show()*
Traceback (most recent call last):
File "", line 1, in
File "/opt/spark/p
Dear Spark Community,
I'm writing to seek your expertise in optimizing the performance of our
Spark History Server (SHS) deployed on Amazon EKS. We're encountering
timeouts (HTTP 504) when loading large event logs exceeding 5 GB.
*Our Setup:*
- Deployment: SHS on EKS with Nginx ingress (idle
My recommendation! is using materialized views (MVs) created in Hive with
Spark Structured Streaming and Change Data Capture (CDC) is a good
combination for efficiently streaming view data updates in your scenario.
HTH
Mich Talebzadeh,
Technologist | Architect | Data Engineer | Generative AI
Thanks for the comments I received.
So in summary, Apache Spark itself doesn't directly manage materialized
views,(MV) but it can work with them through integration with the
underlying data storage systems like Hive or through iceberg. I believe
databricks through unity catalog support MVs
(removing dev@ as I don't think this is dev@ related thread but more about
"question")
My understanding is that Apache Spark does not support Materialized View.
That's all. IMHO it's not a proper expectation that all operations in
Apache Hive will be supported in Apache Spark. They are
I do not think the issue is with DROP MATERIALIZED VIEW only, but also with
CREATE MATERIALIZED VIEW, because neither is supported in Spark. I guess
you must have created the view from Hive and are trying to drop it from
Spark and that is why you are running to the issue with DROP first
An issue I encountered while working with Materialized Views in Spark SQL.
It appears that there is an inconsistency between the behavior of
Materialized Views in Spark SQL and Hive.
When attempting to execute a statement like DROP MATERIALIZED VIEW IF
EXISTS test.mv in Spark SQL, I encountered
from view definition) by using spark structured
streaming.
Issue:
1. Here we are facing issue - For each incomming id here we running view
definition(so it will read all the data from all the data) and check if any
of the incomming id is present in the collective id's of view result, Due
to which
Hi Steve,
Thanks for your statement. I tend to use uuid myself to avoid
collisions. This built-in function generates random IDs that are highly
likely to be unique across systems. My concerns are on edge so to speak. If
the Spark application runs for a very long time or encounters restarts
Hi Mich,
I was just reading random questions on the user list when I noticed that you
said:
On 25 Apr 2024, at 2:12 AM, Mich Talebzadeh wrote:
1) You are using monotonically_increasing_id(), which is not
collision-resistant in distributed environments like Spark. Multiple hosts
can
.
My suggestions
- Increase Executor Memory: Allocate more memory per executor (e.g., 2GB
or 3GB) to allow for multiple executors within available cluster memory.
- Adjust Driver Pod Resources: Ensure the driver pod has enough memory
to run Spark and manage executors.
- Optimize
Respected Sir/Madam,
I am Tarunraghav. I have a query regarding spark on kubernetes.
We have an eks cluster, within which we have spark installed in the pods.
We set the executor memory as 1GB and set the executor instances as 2, I
have also set dynamic allocation as true. So when I try to read
Thank you.
My main purpose is pass "MaxDop 1" to MSSQL to control the CPU usage. From the
offical doc, I guess the problem of my codes is spark wrap the query to
select * from (SELECT TOP 10 * FROM dbo.Demo with (nolock) WHERE Id = 1 option
(maxdop 1)) spark_gen_alias
"128G"
).set("spark.executor.memoryOverhead", "32G"
).set("spark.driver.cores", "16"
).set("spark.driver.memory", "64G"
)
I dont think b) applies as its a single machine.
Kind regards,
Jelle
Fr
OK let us have a look at these
1) You are using monotonically_increasing_id(), which is not
collision-resistant in distributed environments like Spark. Multiple hosts
can generate the same ID. I suggest switching to UUIDs (e.g.,
uuid.uuid4()) for guaranteed uniqueness.
2) Missing values
___
From: Mich Talebzadeh
Sent: Wednesday, April 24, 2024 4:40 PM
To: Nijland, J.G.W. (Jelle, Student M-CS)
Cc: user@spark.apache.org
Subject: Re: [spark-graphframes]: Generating incorrect edges
OK few observations
1) ID Generation Method: How are you generating unique IDs (UUIDs, seque
jl...@student.utwente.nl> wrote:
> tags: pyspark,spark-graphframes
>
> Hello,
>
> I am running pyspark in a podman container and I have issues with
> incorrect edges when I build my graph.
> I start with loading a source dataframe from a parquet directory on my
&
You might be able to leverage the prepareQuery option, that is at
https://spark.apache.org/docs/3.5.1/sql-data-sources-jdbc.html#data-source-option
... this was introduced in Spark 3.4.0 to handle temp table query and CTE
query against MSSQL server since what you send in is not actually what
tags: pyspark,spark-graphframes
Hello,
I am running pyspark in a podman container and I have issues with incorrect
edges when I build my graph.
I start with loading a source dataframe from a parquet directory on my server.
The source dataframe has the following columns
[QUESTION] How to pass MAXDOP option · Issue #2395 · microsoft/mssql-jdbc
(github.com)
Hi team,
I am suggested to require help form spark community.
We suspect spark rewerite the query before pass to ms sql, and it lead to
syntax error.
Is there any work around to let make my codes work
In Flink, you can create flow calculation tables using Flink SQL, and directly
connect with SQL through CDC and Kafka. How to use SQL for flow calculation in
Spark
308027...@qq.com
I want to use spark jdbc to access alibaba cloud hologres
(https://www.alibabacloud.com/product/hologres) internal hidden column
`hg_binlog_timestamp_us ` but met the following error:
Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot
resolve 'hg_binlog_ti
Hello,
In my organization, we have an accounting system for spark jobs that uses
the task execution time to determine how much time a spark job uses the
executors for and we use it as a way to segregate cost. We sum all the task
times per job and apply proportions. Our clusters follow a 1 task
Hi all,
Is it possible to integrate StreamingQueryListener with Spark metrics so
that metrics can be reported through Spark's internal metric system?
Ideally, I would like to report some custom metrics through
StreamingQueryListener and export them to Spark's JmxSink.
Best,
Mason
We are happy to announce the availability of Apache Spark 3.4.3!
Spark 3.4.3 is a maintenance release containing many fixes including
security and correctness domains. This release is based on the
branch-3.4 maintenance branch of Spark. We strongly
recommend all 3.4 users to upgrade
Hello,
I'm very new to the Spark ecosystem, apologies if this question is a bit
simple.
I want to modify a custom fork of Spark to remove function support. For
example, I want to remove the query runners ability to call reflect and
java_method. I saw that there exists a data structure in spark
Hello,
I'm very new to the Spark ecosystem, apologies if this question is a bit
simple.
I want to modify a custom fork of Spark to remove function support. For
example, I want to remove the query runners ability to call reflect and
java_method. I saw that there exists a data structure in spark
(slice library, used by trino)
https://github.com/airlift/slice/blob/master/src/main/java/io/airlift/slice/XxHash64.java
Was there a special motivation behind this? or is 42 just used for the sake
of the hitchhiker's guide reference? It's very common for spark to interact
with other tools (either via
h consumes committed
messages from kafka directly(, which is not so scalable, I think.).
But the main point of this approach which I need is that spark
session needs to be used to save rdd(parallelized consumed messages) to
iceberg table.
Consumed messages will be converted to spark rdd which wil
Interesting
My concern is infinite Loop in* foreachRDD*: The *while(true)* loop within
foreachRDD creates an infinite loop within each Spark executor. This might
not be the most efficient approach, especially since offsets are committed
asynchronously.?
HTH
Mich Talebzadeh,
Technologist
Because spark streaming for kafk transaction does not work correctly to
suit my need, I moved to another approach using raw kafka consumer which
handles read_committed messages from kafka correctly.
My codes look like the following.
JavaDStream stream = ssc.receiverStream(new CustomReceiver
)
(chango-private-1.chango.private executor driver):
java.lang.IllegalArgumentException: requirement failed: Got wrong record
for spark-executor-school-student-group school-student-7 even after seeking
to offset 11206961 got offset 11206962 instead. If this is a compacted
topic, consider enabling
Hi Kidong,
There may be few potential reasons why the message counts from your Kafka
producer and Spark Streaming consumer might not match, especially with
transactional messages and read_committed isolation level.
1) Just ensure that both your Spark Streaming job and the Kafka consumer
written
Hi,
I have a kafka producer which sends messages transactionally to kafka and
spark streaming job which should consume read_committed messages from kafka.
But there is a problem for spark streaming to consume read_committed
messages.
The count of messages sent by kafka producer transactionally
convention for Spark DataFrames
(usually snake_case). Use snake_case for better readability like:
"total_price_in_millions_gbp"
So this is the gist
+--+-+---+
|district |NumberOfOffshoreOwned|total_p
I think this answers your question about what to do if you need more space
on nodes.
https://spark.apache.org/docs/latest/running-on-kubernetes.html#local-storage
Local Storage
<https://spark.apache.org/docs/latest/running-on-kubernetes.html#local-storage>
Spark supports using volumes to
Hi Mich,
Thanks for the reply.
I did come across that file but it didn't align with the appearance of
`PartitionedFile`:
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/PartitionedFileUtil.scala
In fact, the code snippet you shared also
thanks everyone for their contributions
>>
>> I was going to reply to @Enrico Minack but
>> noticed additional info. As I understand for example, Apache Uniffle is an
>> incubating project aimed at providing a pluggable shuffle service for
>> Spark. So basically, all these &quo
ons
>
> I was going to reply to @Enrico Minack but
> noticed additional info. As I understand for example, Apache Uniffle is an
> incubating project aimed at providing a pluggable shuffle service for
> Spark. So basically, all these "external shuffle services" have in c
interesting. So below should be the corrected code with the suggestion in
the [SPARK-47718] .sql() does not recognize watermark defined upstream -
ASF JIRA (apache.org) <https://issues.apache.org/jira/browse/SPARK-47718>
# Define schema for parsing Kafka messages
schema = Stru
Sorry this is not a bug but essentially a user error. Spark throws a really
confusing error and I'm also confused. Please see the reply in the ticket
for how to make things correct.
https://issues.apache.org/jira/browse/SPARK-47718
刘唯 于2024年4月6日周六 11:41写道:
> This indeed looks like a bug
If you're using just Spark you could try turning on the history server
<https://spark.apache.org/docs/latest/monitoring.html> and try to glean
statistics from there. But there is no one location or log file which
stores them all.
Databricks, which is a managed Spark solution, pr
Hi,
I believe this is the package
https://raw.githubusercontent.com/apache/spark/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FilePartition.scala
And the code
case class FilePartition(index: Int, files: Array[PartitionedFile])
extends Partition
Well you can do a fair bit with the available tools
The Spark UI, particularly the Staging and Executors tabs, do provide some
valuable insights related to database health metrics for applications using
a JDBC source.
Stage Overview:
This section provides a summary of all the stages executed
Hi All,
I've been diving into the source code to get a better understanding of how
file splitting works from a user perspective. I've hit a deadend at
`PartitionedFile`, for which I cannot seem to find a definition? It appears
though it should be found at
Hi,
First thanks everyone for their contributions
I was going to reply to @Enrico Minack but
noticed additional info. As I understand for example, Apache Uniffle is an
incubating project aimed at providing a pluggable shuffle service for
Spark. So basically, all these "external sh
I see that both Uniffle and Celebron support S3/HDFS backends which is
great.
In the case someone is using S3/HDFS, I wonder what would be the advantages
of using Celebron or Uniffle vs IBM shuffle service plugin
<https://github.com/IBM/spark-s3-shuffle> or Cloud Shuffle Storage Plugin
fr
Hello, I have a spark application with jdbc source and do some calculation.
To monitor application healthy, I need db related metrics per database like
number of connections, sql execution time and sql fired time distribution etc.
Does anybody know how to get them? Thanks!
did
>
> The configurations below can be used with k8s deployments of Spark. Spark
> applications running on k8s can utilize these configurations to seamlessly
> access data stored in Google Cloud Storage (GCS) and Amazon S3.
>
> For Google GCS we may have
There is Apache incubator project Uniffle:
https://github.com/apache/incubator-uniffle
It stores shuffle data on remote servers in memory, on local disk and HDFS.
Cheers,
Enrico
Am 06.04.24 um 15:41 schrieb Mich Talebzadeh:
I have seen some older references for shuffle service for k8s,
The type-safe example given at
https://spark.apache.org/docs/latest/sql-ref-functions-udf-aggregate.html
fails with a not serializable exception
Is this a known issue?
_von_Braun>Von
Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
On Sun, 7 Apr 2024 at 15:08, Cheng Pan wrote:
> Instead of External Shuffle Shufle, Apache Celeborn might be a good option
> as a Remote Shuffle Service for Spark on K8s.
>
> There are some
There is an IBM shuffle service plugin that supports S3
https://github.com/IBM/spark-s3-shuffle
Though I would think a feature like this could be a part of the main Spark
repo. Trino already has out-of-box support for s3 exchange (shuffle) and
it's very useful.
Vakaris
On Sun, Apr 7, 2024 at 12
Instead of External Shuffle Shufle, Apache Celeborn might be a good option as a
Remote Shuffle Service for Spark on K8s.
There are some useful resources you might be interested in.
[1] https://celeborn.apache.org/
[2] https://www.youtube.com/watch?v=s5xOtG6Venw
[3] https://github.com/aws
Splendid
The configurations below can be used with k8s deployments of Spark. Spark
applications running on k8s can utilize these configurations to seamlessly
access data stored in Google Cloud Storage (GCS) and Amazon S3.
For Google GCS we may have
spark_config_gcs
pedia.org/wiki/Wernher_von_Braun>Von
Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
On Sat, 6 Apr 2024 at 21:28, Bjørn Jørgensen
wrote:
> You can make a PVC on K8S call it 300GB
>
> make a folder in yours dockerfile
> WORKDIR /opt/spark/work-dir
> RUN chmod g+w /opt
You can make a PVC on K8S call it 300GB
make a folder in yours dockerfile
WORKDIR /opt/spark/work-dir
RUN chmod g+w /opt/spark/work-dir
start spark with adding this
.config("spark.kubernetes.driver.volumes.persistentVolumeClaim.300gb.options.claimName",
"300
n Kafka). However, your query
> involves a streaming aggregation: group by provinceId, window('createTime',
> '1 hour', '30 minutes'). The problem is that Spark Structured Streaming
> requires a watermark to ensure exactly-once processing when using
> aggregations with append mode. Your c
[[VOTE] Release Plan for Apache Spark 4.0.0 (June 2024)](
https://lists.apache.org/thread/r0zn6rd8y25yn2dg59ktw3ttrwxzqrfb)
Apache Spark 4.0.0 Release Plan
===
1. After creating `branch-3.5`, set "4.0.0-SNAPSHOT" in master branch.
2. Creating `branch-4.0
I have seen some older references for shuffle service for k8s,
although it is not clear they are talking about a generic shuffle
service for k8s.
Anyhow with the advent of genai and the need to allow for a larger
volume of data, I was wondering if there has been any more work on
this matter.
I don't really understand how Iceberg and the hadoop libraries can coexist in a
deployment.
The latest spark (3.5.1) base image contains the hadoop-client*-3.3.4.jar. The
AWS v2 SDK is only supported in hadoop*-3.4.0.jar and onward.
Iceberg AWS integration states AWS v2 SDK is
required<ht
(ParquetFileFormat.scala:429)
From: Oxlade, Dan
Sent: 03 April 2024 14:33
To: Aaron Grubb ; user@spark.apache.org
Subject: Re: [EXTERNAL] Re: [Spark]: Spark / Iceberg / hadoop-aws compatibility
matrix
[sorry; replying all this time]
With hadoop-*-3.3.6 in place of the 3.4.0
April 2024 13:52
To: user@spark.apache.org
Subject: [EXTERNAL] Re: [Spark]: Spark / Iceberg / hadoop-aws compatibility
matrix
Downgrade to hadoop-*:3.3.x, Hadoop 3.4.x is based on the AWS SDK v2 and should
probably be considered as breaking for tools that build on < 3.4.0 while using
ect: [Spark]: Spark / Iceberg / hadoop-aws compatibility matrix
Hi all,
I’ve struggled with this for quite some time.
My requirement is to read a parquet file from s3 to a Dataframe then append to
an existing iceberg table.
In order to read the parquet I need the hadoop-aws dependency for
. Both of these
dependencies have a transitive dependency on the aws SDK. I can't find versions
for Spark 3.4 that work together.
Current Versions:
Spark 3.4.1
iceberg-spark-runtime-3.4-2.12:1.4.1
iceberg-aws-bundle:1.4.1
hadoop-aws:3.4.0
hadoop-common:3.4.0
I've tried a number of combinations
is designed for
scenarios where you want to append new data to an existing dataset at the
sink (in this case, the "sink" topic in Kafka). However, your query
involves a streaming aggregation: group by provinceId, window('createTime',
'1 hour', '30 minutes'). The problem is that Spark
rue") \
> .option("startingOffsets", "earliest") \
> .load() \
> .select(from_json(col("value").cast("string"),
> schema).alias("parsed_value"))
> .select
rom the streaming DataFrame with watermark
streaming_df.createOrReplaceTempView("michboy")
# Execute SQL queries on the temporary view
result_df = (spark.sql("""
SELECT
window.start, window.end, provinceId, sum(payAmount) as
totalPayAmount
FROM michboy
Hello!
I am attempting to write a streaming pipeline that would consume data from a
Kafka source, manipulate the data, and then write results to a downstream sink
(Kafka, Redis, etc). I want to write fully formed SQL instead of using the
function API that Spark offers. I read a few guides
Hi Team,
Can you let us know the when this spark 4.x will be released to maven.
regards,
Parul
Get Outlook for iOS<https://aka.ms/o0ukef>
From: Bjørn Jørgensen
Sent: Wednesday, February 28, 2024 5:06:54 PM
To: Chawla, Parul
Cc: Sahni, Ashima
Hello, Ive got a project which has to use newest versions of both Apache
Spark and Spring Boot due to vulnerabilities issues. I build my project using
Gradle. And when I try to run it i get : Unsatisfied dependecy exception
about javax/servlet/Servlet. Ive tried to add jakarta servlet
I am trying to understand the Spark Architecture for my upcoming
certification, however there seems to be conflicting information available.
https://stackoverflow.com/questions/47782099/what-is-the-relationship-between-tasks-and-partitions
Does Spark assign a Spark partition to only a single
+1
--
Thank You & Best Regards
Winston Lai
From: Jay Han
Date: Sunday, 24 March 2024 at 08:39
To: Kiran Kumar Dusi
Cc: Farshid Ashouri , Matei Zaharia
, Mich Talebzadeh , Spark
dev list , user @spark
Subject: Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark
Communit
> Some of you may be aware that Databricks community Home | Databricks
>>> have just launched a knowledge sharing hub. I thought it would be a
>>> good idea for the Apache Spark user group to have the same, especially
>>> for repeat questions on Spark core, Spark SQL, Spa
Sorry from this link
Leveraging Generative AI with Apache Spark: Transforming Data Engineering |
LinkedIn
<https://www.linkedin.com/pulse/leveraging-generative-ai-apache-spark-transforming-mich-lxbte/?trackingId=aqZMBOg4O1KYRB4Una7NEg%3D%3D>
Mich Talebzadeh,
Technologist | Data | Generat
You may find this link of mine in Linkedin for the said article. We
can use Linkedin for now.
Leveraging Generative AI with Apache Spark: Transforming Data
Engineering | LinkedIn
Mich Talebzadeh,
Technologist | Data | Generative AI | Financial Fraud
London
United Kingdom
view my Linkedin
>> good idea for the Apache Spark user group to have the same, especially
>> for repeat questions on Spark core, Spark SQL, Spark Structured
>> Streaming, Spark Mlib and so forth.
>>
>> Apache Spark user and dev groups have been around for a good while.
>> Th
+1
On Mon, 18 Mar 2024, 11:00 Mich Talebzadeh,
wrote:
> Some of you may be aware that Databricks community Home | Databricks
> have just launched a knowledge sharing hub. I thought it would be a
> good idea for the Apache Spark user group to have the same, especially
> for repe
se cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
> Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>
Hi Team,
We're encountering an issue with Spark UI.
I've documented the details here:
https://issues.apache.org/jira/browse/SPARK-47232
When enabled reverse proxy in master and worker configOptions. We're not
able to access different tabs available in spark UI e.g.(stages,
environment, storage etc
t;.
On Mon, 18 Mar 2024 at 20:31, Bjørn Jørgensen
mailto:bjornjorgen...@gmail.com>> wrote:
something like this Spark community ·
GitHub<https://github.com/Spark-community>
man. 18. mars 2024 kl. 17:26 skrev Parsian, Mahmoud
:
Good idea. Will be useful
+1
From: ashok34...@yahoo.
+1 Great initiative.
QQ : Stack overflow has a similar feature called "Collectives", but I am
not sure of the expenses to create one for Apache Spark. With SO being used
( atleast before ChatGPT became quite the norm for searching questions), it
already has a lot of questions asked an
>>
>>
>>
>>
>>
>>
>> *From: *ashok34...@yahoo.com.INVALID
>> *Date: *Monday, March 18, 2024 at 6:36 AM
>> *To: *user @spark , Spark dev list <
>> d...@spark.apache.org>, Mich Talebzadeh
>> *Cc: *Matei Zaharia
>> *Subject: *R
org/wiki/Wernher_von_Braun>)".
>
>
> On Mon, 18 Mar 2024 at 16:23, Parsian, Mahmoud
> wrote:
>
>> Good idea. Will be useful
>>
>>
>>
>> +1
>>
>>
>>
>>
>>
>>
>>
>> *From: *ashok34...@yahoo.com.INVALI
OK thanks for the update.
What does officially blessed signify here? Can we have and run it as a
sister site? The reason this comes to my mind is that the interested
parties should have easy access to this site (from ISUG Spark sites) as a
reference repository. I guess the advice would
aranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one - thousand
> expert opinions ( Werner ( https://en.wikipedia.org/wiki/Wernher_von_Braun
> ) Von Braun ( https://en.wikipedia.org/wiki/Wernher_von_Braun ) )".
>
>
>
>
>
n.wikipedia.org/wiki/Wernher_von_Braun>Von
Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
On Mon, 18 Mar 2024 at 20:31, Bjørn Jørgensen
wrote:
> something like this Spark community · GitHub
> <https://github.com/Spark-community>
>
>
> man. 18. m
something like this Spark community · GitHub
<https://github.com/Spark-community>
man. 18. mars 2024 kl. 17:26 skrev Parsian, Mahmoud
:
> Good idea. Will be useful
>
>
>
> +1
>
>
>
>
>
>
>
> *From: *ashok34...@yahoo.com.INVALID
> *Date: *Monda
+1
Thanks for proposing
On Mon, Mar 18, 2024 at 9:25 AM Parsian, Mahmoud
wrote:
> Good idea. Will be useful
>
>
>
> +1
>
>
>
>
>
>
>
> *From: *ashok34...@yahoo.com.INVALID
> *Date: *Monday, March 18, 2024 at 6:36 AM
> *To: *user @spark , Sp
Hi,
I have a specific problem where I have to get the data from REST APIs and
store it, and then do some transformations on it and then write to a RDBMS
table.
I am wondering if Spark will help in this regard.
I am confused as to how do I store the data while I actually acquire it on
the driver
1 - 100 of 34796 matches
Mail list logo