Hi, Shane,
Thank you!
Xiao
2018-04-30 20:27 GMT-07:00 shane knapp :
> we just noticed that we're unable to connect to jenkins, and have reached
> out to our NOC support staff at our colo. until we hear back, there's
> nothing we can do.
>
> i'll update the list as soon as i hear something. so
we just noticed that we're unable to connect to jenkins, and have reached
out to our NOC support staff at our colo. until we hear back, there's
nothing we can do.
i'll update the list as soon as i hear something. sorry for the
inconvenience!
shane
--
Shane Knapp
UC Berkeley EECS Research / RIS
This seems to be an underexposed part of the API. My use case is this: I
want to unpersist all DataFrames except a specific few. I want to do this
because I know at a specific point in my pipeline that I have a handful of
DataFrames that I need, and everything else is no longer needed.
The problem
Please open a JIRA then!
On Fri, Apr 27, 2018 at 3:59 AM Hemant Bhanawat
wrote:
> I see.
>
> monotonically_increasing_id on streaming dataFrames will be really helpful
> to me and I believe to many more users. Adding this functionality in Spark
> would be efficient in terms of performance as com
Hello there,
I have a quick question regarding how to share data (a small data
collection) between a kafka producer and consumer using spark streaming
(spark 2.2):
(A)
the data published by a kafka producer is received in order on the kafka
consumer side (see (a) copied below).
(B)
however, coll
I'd argue that letting bad cases influence the design is an explicit goal
of DataSourceV2. One of the primary motivations for the project was that
file sources hook into a series of weird internal side channels, with
favorable performance characteristics that are difficult to match in the
API we ac
Should we really plan the API for a source with state that grows
indefinitely? It sounds like we're letting a bad case influence the design,
when we probably shouldn't.
On Mon, Apr 30, 2018 at 11:05 AM, Joseph Torres <
joseph.tor...@databricks.com> wrote:
> Offset is just a type alias for arbitra
Offset is just a type alias for arbitrary JSON-serializable state. Most
implementations should (and do) just toss the blob at Spark and let Spark
handle recovery on its own.
In the case of file streams, the obstacle is that the conceptual offset is
very large: a list of every file which the stream
Why don't we just have the source return a Serializable of state when it
reports offsets? Then Spark could handle storing the source's state and the
source wouldn't need to worry about file system paths. I think that would
be easier for implementations and better for recovery because it wouldn't
le
This would be useful to us, so I've created a JIRA ticket for this
discussion: https://issues.apache.org/jira/browse/SPARK-24122
On Wed, Mar 28, 2018 at 10:28 AM, Anirudh Ramanathan <
ramanath...@google.com.invalid> wrote:
> We discussed this early on in our fork and I think we should have this i
10 matches
Mail list logo