Anirudh (or somebody else familiar with spark-on-k8s),
Can you create a short plan on how we would integrate and do code review to
merge the project? If the diff is too large it'd be difficult to review and
merge in one shot. Once we have a plan we can create subtickets to track
the progress.
I'm in the process of migrating a few applications from Spark 2.1.1 to
Spark 2.2.0 and so far the transition has been smooth. One odd thing is
that when I query a Hive table that I do not own, but have read access, I
get a very long WARNING with a stack trace that basically says I do not
have
That would be awesome. I’m not sure whether we want 3.0 to be right after 2.3
(I guess this Scala issue is one reason to start discussing that), but even if
we do, I imagine that wouldn’t be out for at least 4-6 more months after 2.3,
and that’s a long time to go without Scala 2.12 support. If
+1 on this and like the suggestion of type in string form.
Would it be correct to assume there will be data type check, for example the
returned pandas data frame column data types match what are specified. We have
seen quite a bit of issues/confusions with that in R.
Would it make sense to
Ok, thanks.
+1 on the SPIP for scope etc
On API details (will deal with in code reviews as well but leaving a note
here in case I forget)
1. I would suggest having the API also accept data type specification in
string form. It is usually simpler to say "long" then "LongType()".
2. Think about
Yes, the aggregation is out of scope for now.
I think we should continue discussing the aggregation at JIRA and we will
be adding those later separately.
Thanks.
On Fri, Sep 1, 2017 at 6:52 PM, Reynold Xin wrote:
> Is the idea aggregate is out of scope for the current
Why does ordering matter here for sort vs filter? The source should be able
to handle it in whatever way it wants (which is almost always filter
beneath sort I'd imagine).
The only ordering that'd matter in the current set of pushdowns is limit -
it should always mean the root of the pushded
Is the idea aggregate is out of scope for the current effort and we will be
adding those later?
On Fri, Sep 1, 2017 at 8:01 AM Takuya UESHIN wrote:
> Hi all,
>
> We've been discussing to support vectorized UDFs in Python and we almost
> got a consensus about the APIs, so
Hi,
Just reviewing StateStoreRestoreExec [1] and been wondering how to
know whether a state was available for a key. It has numOutputRows
metric [2], but that gives the number of aggregations from the child
operator only and seems to say nothing about whether state was
available for an
OK, what I'll do is focus on some changes that can be merged to master
without impacting the 2.11 build (e.g. putting kafka-0.8 behind a profile,
maybe, or adding the 2.12 REPL). Anything that is breaking, we can work on
in a series of open PRs, or maybe a branch, yea. It's unusual but might be
If the changes aren’t that hard, I think we should also consider building a
Scala 2.12 version of Spark 2.3 in a separate branch. I’ve definitely seen
concerns from some large Scala users that Spark isn’t supporting 2.12 soon
enough. I thought SPARK-14220 was blocked mainly because the changes
Hi all,
We've been discussing to support vectorized UDFs in Python and we almost
got a consensus about the APIs, so I'd like to summarize and call for a
vote.
Note that this vote should focus on APIs for vectorized UDFs, not APIs for
vectorized UDAFs or Window operations.
12 matches
Mail list logo