date:20180430

Re: [build system] jenkins master unreachable, build system currently down

2018-04-30 Thread Xiao Li

Hi, Shane, Thank you! Xiao 2018-04-30 20:27 GMT-07:00 shane knapp : > we just noticed that we're unable to connect to jenkins, and have reached > out to our NOC support staff at our colo. until we hear back, there's > nothing we can do. > > i'll update the list as soon as i hear something. so

[build system] jenkins master unreachable, build system currently down

2018-04-30 Thread shane knapp

we just noticed that we're unable to connect to jenkins, and have reached out to our NOC support staff at our colo. until we hear back, there's nothing we can do. i'll update the list as soon as i hear something. sorry for the inconvenience! shane -- Shane Knapp UC Berkeley EECS Research / RIS

Identifying specific persisted DataFrames via getPersistentRDDs()

2018-04-30 Thread Nicholas Chammas

This seems to be an underexposed part of the API. My use case is this: I want to unpersist all DataFrames except a specific few. I want to do this because I know at a specific point in my pipeline that I have a handful of DataFrames that I need, and everything else is no longer needed. The problem

Re: Sorting on a streaming dataframe

2018-04-30 Thread Michael Armbrust

Please open a JIRA then! On Fri, Apr 27, 2018 at 3:59 AM Hemant Bhanawat wrote: > I see. > > monotonically_increasing_id on streaming dataFrames will be really helpful > to me and I believe to many more users. Adding this functionality in Spark > would be efficient in terms of performance as com

re: sharing data via kafka broker using spark streaming/ AnalysisException on collect()

2018-04-30 Thread Peter Liu

Hello there, I have a quick question regarding how to share data (a small data collection) between a kafka producer and consumer using spark streaming (spark 2.2): (A) the data published by a kafka producer is received in order on the kafka consumer side (see (a) copied below). (B) however, coll

Re: Datasource API V2 and checkpointing

2018-04-30 Thread Joseph Torres

I'd argue that letting bad cases influence the design is an explicit goal of DataSourceV2. One of the primary motivations for the project was that file sources hook into a series of weird internal side channels, with favorable performance characteristics that are difficult to match in the API we ac

Re: Datasource API V2 and checkpointing

2018-04-30 Thread Ryan Blue

Should we really plan the API for a source with state that grows indefinitely? It sounds like we're letting a bad case influence the design, when we probably shouldn't. On Mon, Apr 30, 2018 at 11:05 AM, Joseph Torres < joseph.tor...@databricks.com> wrote: > Offset is just a type alias for arbitra

Re: Datasource API V2 and checkpointing

2018-04-30 Thread Joseph Torres

Offset is just a type alias for arbitrary JSON-serializable state. Most implementations should (and do) just toss the blob at Spark and let Spark handle recovery on its own. In the case of file streams, the obstacle is that the conceptual offset is very large: a list of every file which the stream

Re: Datasource API V2 and checkpointing

2018-04-30 Thread Ryan Blue

Why don't we just have the source return a Serializable of state when it reports offsets? Then Spark could handle storing the source's state and the source wouldn't need to worry about file system paths. I think that would be easier for implementations and better for recovery because it wouldn't le

Re: [Kubernetes] structured-streaming driver restarts / roadmap

2018-04-30 Thread Oz Ben Ami

This would be useful to us, so I've created a JIRA ticket for this discussion: https://issues.apache.org/jira/browse/SPARK-24122 On Wed, Mar 28, 2018 at 10:28 AM, Anirudh Ramanathan < ramanath...@google.com.invalid> wrote: > We discussed this early on in our fork and I think we should have this i

Re: [build system] jenkins master unreachable, build system currently down

[build system] jenkins master unreachable, build system currently down

Identifying specific persisted DataFrames via getPersistentRDDs()

Re: Sorting on a streaming dataframe

re: sharing data via kafka broker using spark streaming/ AnalysisException on collect()

Re: Datasource API V2 and checkpointing

Re: Datasource API V2 and checkpointing

Re: Datasource API V2 and checkpointing

Re: Datasource API V2 and checkpointing

Re: [Kubernetes] structured-streaming driver restarts / roadmap

10 matches

Site Navigation

Mail list logo

Footer information