Dear Sean,
I do agree with you to a certain extent, makes sense. Perhaps I am wrong in
asking for native integrations and not depending on over engineered
external solutions which have their own performance issues, and bottlenecks
in live production environment. But asking and stating ones
Hi Bitfox,
yes distributed training using Pytorch and Tensorflow is really superb and
great and you are spot on. There is actually absolutely no need for
solutions like Ray/ Petastorm etc...
But in case I want to pre process data in SPARK and push the results to
these deep learning libraries,
We got a Spark program that iterates through a while loop on the same
input DataFrame and produces different results per iteration. I see
through Spark UI that the workload is concentrated on a single core of
the same worker. Is there anyway to distribute the workload to
different
Unsubscribe
Thank you.
From: Peyman Mohajerian [mailto:mohaj...@gmail.com]
Sent: Thursday, February 24, 2022 9:00 AM
To: Michael Williams (SSI)
Cc: user@spark.apache.org
Subject: Re: Consuming from Kafka to delta table - stream or batch mode?
If you want to batch consume from Kafka, trigger-once config
If you want to batch consume from Kafka, trigger-once config would work
with structured streaming and you get the benefit of the checkpointing.
On Thu, Feb 24, 2022 at 6:07 AM Michael Williams (SSI) <
michael.willi...@ssigroup.com> wrote:
> Hello,
>
>
>
> Our team is working with Spark (for the
What is the vulnerability and does it affect Spark? what is the remediation?
Can you try updating these and open a pull request if it works?
On Thu, Feb 24, 2022 at 7:28 AM vinodh palanisamy
wrote:
> Hi Team,
> We are using spark-core_2.13:3.2.1 in our project. Where in that
> version
Hello,
Our team is working with Spark (for the first time) and one of the sources we
need to consume is Kafka (multiple topics). Are there any practical or
operational issues to be aware of when deciding whether to a) consume in
batches until all messages are consumed then shut down the spark
On the contrary, distributed deep learning is not data parallel. It's
dominated by the need to share parameters across workers.
Gourav, I don't understand what you're looking for. Have you looked at
Petastorm and Horovod? they _use Spark_, not another platform like Ray. Why
recreate this which has
Hi Team,
We are using spark-core_2.13:3.2.1 in our project. Where in that
version Blackduck scan reports the below the js files as vulnerable.
dataTables.bootstrap4.1.10.20.min.js
jquery.dataTables..1.10.20.min.js
Please let me know if this can be fixed in my project or Datatables
I have been using tensorflow for a long time, it's not hard to implement a
distributed training job at all, either by model parallelization or data
parallelization. I don't think there is much need to develop spark to
support tensorflow jobs. Just my thoughts...
On Thu, Feb 24, 2022 at 4:36 PM
Hi,
I do not think that there is any reason for using over engineered platforms
like Petastorm and Ray, except for certain use cases.
What Ray is doing, except for certain use cases, could have been easily
done by SPARK, I think, had the open source community got that steer. But
maybe I am wrong
12 matches
Mail list logo