Re: [E] COMMERCIAL BULK: Re: TensorFlow on Spark

2022-02-24 Thread Gourav Sengupta
Dear Sean, I do agree with you to a certain extent, makes sense. Perhaps I am wrong in asking for native integrations and not depending on over engineered external solutions which have their own performance issues, and bottlenecks in live production environment. But asking and stating ones

Re: [E] COMMERCIAL BULK: Re: TensorFlow on Spark

2022-02-24 Thread Gourav Sengupta
Hi Bitfox, yes distributed training using Pytorch and Tensorflow is really superb and great and you are spot on. There is actually absolutely no need for solutions like Ray/ Petastorm etc... But in case I want to pre process data in SPARK and push the results to these deep learning libraries,

Non-Partition based Workload Distribution

2022-02-24 Thread Artemis User
We got a Spark program that iterates through a while loop on the same input DataFrame and produces different results per iteration. I see through Spark UI that the workload is concentrated on a single core of the same worker.  Is there anyway to distribute the workload to different

[no subject]

2022-02-24 Thread Luca Borin
Unsubscribe

RE: Consuming from Kafka to delta table - stream or batch mode?

2022-02-24 Thread Michael Williams (SSI)
Thank you. From: Peyman Mohajerian [mailto:mohaj...@gmail.com] Sent: Thursday, February 24, 2022 9:00 AM To: Michael Williams (SSI) Cc: user@spark.apache.org Subject: Re: Consuming from Kafka to delta table - stream or batch mode? If you want to batch consume from Kafka, trigger-once config

Re: Consuming from Kafka to delta table - stream or batch mode?

2022-02-24 Thread Peyman Mohajerian
If you want to batch consume from Kafka, trigger-once config would work with structured streaming and you get the benefit of the checkpointing. On Thu, Feb 24, 2022 at 6:07 AM Michael Williams (SSI) < michael.willi...@ssigroup.com> wrote: > Hello, > > > > Our team is working with Spark (for the

Re: DataTables 1.10.20 reported vulnerable in spark-core_2.13:3.2.1

2022-02-24 Thread Sean Owen
What is the vulnerability and does it affect Spark? what is the remediation? Can you try updating these and open a pull request if it works? On Thu, Feb 24, 2022 at 7:28 AM vinodh palanisamy wrote: > Hi Team, > We are using spark-core_2.13:3.2.1 in our project. Where in that > version

Consuming from Kafka to delta table - stream or batch mode?

2022-02-24 Thread Michael Williams (SSI)
Hello, Our team is working with Spark (for the first time) and one of the sources we need to consume is Kafka (multiple topics). Are there any practical or operational issues to be aware of when deciding whether to a) consume in batches until all messages are consumed then shut down the spark

Re: [E] COMMERCIAL BULK: Re: TensorFlow on Spark

2022-02-24 Thread Sean Owen
On the contrary, distributed deep learning is not data parallel. It's dominated by the need to share parameters across workers. Gourav, I don't understand what you're looking for. Have you looked at Petastorm and Horovod? they _use Spark_, not another platform like Ray. Why recreate this which has

DataTables 1.10.20 reported vulnerable in spark-core_2.13:3.2.1

2022-02-24 Thread vinodh palanisamy
Hi Team, We are using spark-core_2.13:3.2.1 in our project. Where in that version Blackduck scan reports the below the js files as vulnerable. dataTables.bootstrap4.1.10.20.min.js jquery.dataTables..1.10.20.min.js Please let me know if this can be fixed in my project or Datatables

Re: [E] COMMERCIAL BULK: Re: TensorFlow on Spark

2022-02-24 Thread Bitfox
I have been using tensorflow for a long time, it's not hard to implement a distributed training job at all, either by model parallelization or data parallelization. I don't think there is much need to develop spark to support tensorflow jobs. Just my thoughts... On Thu, Feb 24, 2022 at 4:36 PM

Re: [E] COMMERCIAL BULK: Re: TensorFlow on Spark

2022-02-24 Thread Gourav Sengupta
Hi, I do not think that there is any reason for using over engineered platforms like Petastorm and Ray, except for certain use cases. What Ray is doing, except for certain use cases, could have been easily done by SPARK, I think, had the open source community got that steer. But maybe I am wrong