Re: Spark Kubernetes Operator

2023-04-14 Thread Yuval Itzchakov
xplicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Fri, 14 Apr 2023 at 17:42, Yuval Itzchakov wrote: > >> Hi, >> >> ATM I see the most used option for a Spark

Spark Kubernetes Operator

2023-04-14 Thread Yuval Itzchakov
Hi, ATM I see the most used option for a Spark operator is the one provided by Google: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator Unfortunately, it doesn't seem actively maintained. Are there any plans to support an official Apache Spark community driven operator?

Re: _spark_metadata path issue with S3 lifecycle policy

2023-04-13 Thread Yuval Itzchakov
t;‬‎ < yur...@gmail.com> wrote:‬ > Yeah but can’t you use following? > 1 . For data files: My/path/part- > 2. For partitioned data: my/path/partition= > > > Best regards > > On 13 Apr 2023, at 12:58, Yuval Itzchakov wrote: > >  > The problem is that specifying tw

Re: _spark_metadata path issue with S3 lifecycle policy

2023-04-13 Thread Yuval Itzchakov
sue > > Best regards > > > On 13 Apr 2023, at 11:52, Yuval Itzchakov wrote: > > > >  > > Hi everyone, > > > > I am using Sparks FileStreamSink in order to write files to S3. On the > S3 bucket, I have a lifecycle policy that deletes data older than

_spark_metadata path issue with S3 lifecycle policy

2023-04-13 Thread Yuval Itzchakov
run into a similar problem? -- Best Regards, Yuval Itzchakov.

Regression in Spark SQL UI Tab in Spark 2.2.1

2018-01-11 Thread Yuval Itzchakov
Hi, I've recently installed Spark 2.2.1, and it seems like the SQL tab isn't getting updated at all, although the "Jobs" tab gets updated with new incoming jobs, the SQL tab remains empty, all the time. I was wondering if anyone noticed such regression in 2.2.1? -- Best Rega

Re: Terminating Structured Streaming Applications on Source Failure

2017-08-29 Thread Yuval Itzchakov
minated immediately. And the stream execution > threads are all daemon threads, so it should not affect the termination of > the application whether the queries are active or not. May be something > else is keeping the application alive? > > > > On Tue, Aug 29, 2017 at 2:09 AM, Yuval I

Terminating Structured Streaming Applications on Source Failure

2017-08-29 Thread Yuval Itzchakov
consuming anything from the source and is logically dead. Should this be the behavior? I think that perhaps there should be a configuration that asks whether to completely shutdown the application on source failure. What do you guys think? -- Best Regards, Yuval Itzchakov.

Re: Structured Streaming in Spark 2.0 and DStreams

2016-05-16 Thread Yuval Itzchakov
; Is Tachyon (Alluxio), HDFS (Parquet), NoSQL (HBase, Cassandra), RDBMS > (PostgreSQL, MySQL), Object Store (S3, Swift), or any else I can’t think of > going to be the underlying near real-time storage system? > > Thanks, > Ben > > > On May 15, 2016, at 3:36 PM, Yuval Itz

Re: Structured Streaming in Spark 2.0 and DStreams

2016-05-16 Thread Yuval Itzchakov
ateByKey and mapWithState and how that will be > handled... > > At a quick glance at the code, it seems to be used already in streaming > aggregations. > > Just my two cents, > > Ofir Manor > > Co-Founder & CTO | Equalum > > Mobile: +972-54-7801286 | Email: ofir.ma.

Re: Structured Streaming in Spark 2.0 and DStreams

2016-05-16 Thread Yuval Itzchakov
things like "mapWithState" are going to be translated into RQ, and I think thats the hole that's causing my misunderstanding. On Mon, May 16, 2016 at 1:36 AM Yuval Itzchakov <yuva...@gmail.com> wrote: > Hi Ofir, > Thanks for the elaborated answer. I have read both docu

Re: Structured Streaming in Spark 2.0 and DStreams

2016-05-15 Thread Yuval Itzchakov
Hi Ofir, Thanks for the elaborated answer. I have read both documents, where they do a light touch on infinite Dataframes/Datasets. However, they do not go in depth as regards to how existing transformations on DStreams, for example, will be transformed into the Dataset APIs. I've been browsing

Re: Using dynamic allocation and shuffle service in Standalone Mode

2016-03-08 Thread Yuval Itzchakov
rvices yourself separately. > > -Andrew > > 2016-03-08 11:21 GMT-08:00 Silvio Fiorito <silvio.fior...@granturing.com>: > >> There’s a script to start it up under sbin, start-shuffle-service.sh. Run >> that on each of your worker nodes. >> >> >> &

Re: Using dynamic allocation and shuffle service in Standalone Mode

2016-03-08 Thread Yuval Itzchakov
Actually, I assumed that setting the flag in the spark job would turn on the shuffle service in the workers. I now understand that assumption was wrong. Is there any way to set the flag via the driver? Or must I manually set it via spark-env.sh on each worker? On Tue, Mar 8, 2016, 20:14 Silvio

Re: Using a non serializable third party JSON serializable on a spark worker node throws NotSerializableException

2016-03-01 Thread Yuval Itzchakov
As I said, it is the method which eventually serializes the object. It is declared inside a companion object of a case class. The problem is that Spark will still try to serialize the method, as it needs to execute on the worker. How will that change the fact that `EncodeJson[T]` is not

Re: PairDStreamFunctions.mapWithState fails in case timeout is set without updating State[S]

2016-02-04 Thread Yuval Itzchakov
(wrappedState.isUpdated || timeoutThresholdTime.isDefined) >> /* <--- problem is here */ { >> newStateMap.put(key, wrappedState.get(), batchTime.milliseconds) >> } >> mappedData ++= returned >> } >> {code} >> >> In case the stream has a timeout set, but the state wasn't set at all, the >> "else-if" will still follow through because the timeout is defined but >> "wrappedState" is empty and wasn't set. >> >> If it is mandatory to update state for each entry of *mapWithState*, then >> this code should throw a better exception than "NoSuchElementException", >> which doesn't really saw anything to the developer. >> >> I haven't provided a fix myself because I'm not familiar with the spark >> implementation, but it seems to be there needs to either be an extra check >> if the state is set, or as previously stated a better exception message. >> >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/PairDStreamFunctions-mapWithState-fails-in-case-timeout-is-set-without-updating-State-S-tp26147.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> > -- Best Regards, Yuval Itzchakov.

Re: PairDStreamFunctions.mapWithState fails in case timeout is set without updating State[S]

2016-02-04 Thread Yuval Itzchakov
Awesome. Thanks for the super fast reply. On Thu, Feb 4, 2016, 21:16 Tathagata Das <tathagata.das1...@gmail.com> wrote: > Shixiong has already opened the PR - > https://github.com/apache/spark/pull/11081 > > On Thu, Feb 4, 2016 at 11:11 AM, Yuval Itzchakov <yuva...@gmail.c