They are based on a physical column, the column is real. The function just
only exists in the datasource.
For example
Select ttl(a), ttl(b) FROM table ks.tab
On Tue, Sep 4, 2018 at 11:16 PM Reynold Xin wrote:
> Russell your special columns wouldn’t actually work with option 1 because
> Spark
Russell your special columns wouldn’t actually work with option 1 because
Spark would have to fail them in analysis without an actual physical
column.
On Tue, Sep 4, 2018 at 9:12 PM Russell Spitzer
wrote:
> I'm a big fan of 1 as well. I had to implement something similar using
> custom
I'm a big fan of 1 as well. I had to implement something similar using
custom expressions and it was a bit more work than it should be. In
particular our use case is that columns have certain metadata (ttl,
writetime) which exist not as separate columns but as special values which
can be surfaced.
Thanks for posting the summary. I'm strongly in favor of option 1.
I think that API footprint is fairly small, but worth it. Not only does it
make sources easier to implement by handling parsing, it also makes sources
more reliable because Spark handles validation the same way across sources.
A
+ Liang-Chi and Herman,
I think this is a common requirement to get top N records. For now we
guarantee it by the `TakeOrderedAndProject` operator. However, this
operator may not be used if the
spark.sql.execution.topKSortFallbackThreshold config has a small value.
Shall we reconsider
I'm switching to my another Gmail account, let's see if it still gets
dropped this time.
Hi Ryan,
I'm thinking about the write path and feel the abstraction should be the
same.
We still have logical and physical writing. And the table can create
different logical writing based on how to write.
Thanks
On Wed 5 Sep, 2018, 2:15 AM Russell Spitzer,
wrote:
> RDD: Top
>
> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD@top(num:Int)(implicitord:Ordering[T]):Array[T
> ]
> Which is pretty much what Sean suggested
>
> For Dataframes I think doing a order and
Ryan, Michael and I discussed this offline today. Some notes here:
His use case is to support partitioning data by derived columns, rather
than physical columns, because he didn't want his users to keep adding the
"date" column when in reality they are purely derived from some timestamp
column.
Yuanjian, Thanks for sharing your progress! I was wondering if there was any
prototype code that we could read to get an idea of what the implementation
looks like? We can evaluate the design together and also benchmark workloads
from across the community – that is, we can collect more data
RDD: Top
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD@top(num:Int)(implicitord:Ordering[T]):Array[T
]
Which is pretty much what Sean suggested
For Dataframes I think doing a order and limit would be equivalent after
optimizations.
On Tue, Sep 4, 2018 at 2:28
Latest from Wenchen in case it was dropped.
-- Forwarded message -
From: Wenchen Fan
Date: Mon, Sep 3, 2018 at 6:16 AM
Subject: Re: data source api v2 refactoring
To:
Cc: Ryan Blue , Reynold Xin , <
dev@spark.apache.org>
Hi Mridul,
I'm not sure what's going on, my email was
Sort and take head(n)?
On Tue, Sep 4, 2018 at 12:07 PM Chetan Khatri
wrote:
> Dear Spark dev, anything equivalent in spark ?
>
Dear Spark dev, anything equivalent in spark ?
In a nutshell, what I'd like to do is persist a instantiate a Pipeline (or
extension class of Pipeline) with metadata that is copied to the
PipelineModel when fitted, and can be read again when the fitted model is
loaded by another consumer. These params are specific to the PipelineModel
more than
Same here, I don't see anything from Wenchen... just replies to him.
On Sat, Sep 1, 2018 at 9:31 PM Mridul Muralidharan wrote:
>
>
> Is it only me or are all others getting Wenchen’s mails ? (Obviously Ryan did
> :-) )
> I did not see it in the mail thread I received or in archives ... [1]
>
the docs and publishing builds need some attention... i was planning on
looking in to this after the 2.4 cut and ubuntu port is a little further
along.
see:
https://amplab.cs.berkeley.edu/jenkins/label/spark-docs/
https://amplab.cs.berkeley.edu/jenkins/label/spark-packaging/
fyi, i haven't upgraded jenkins in a couple of years... (yeah, i know...
it's on my todo list).
i'm just assuming that it's an artifact of old PRs going 'stale' somehow,
but since that's not mentioned anywhere in the plugin docs i wouldn't bet
good money on that. :)
On Mon, Sep 3, 2018 at
Of course, we would like to eliminate all of the following tags
"flanky" or "flankytest"
Kazuaki Ishizaki
From: Hyukjin Kwon
To: dev
Cc: Xiao Li , Wenchen Fan
Date: 2018/09/04 14:20
Subject:Re: Spark JIRA tags clarification and management
Thanks, Reynold.
+Adding
18 matches
Mail list logo