Yes there is definitely some similarity to groupby

What is this used for:
https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/logical/LogicalTableFunctionScan.html
Can i model a mapPartitions T -> U as ^ ?

On Thu, Mar 4, 2021 at 12:06 PM Rui Wang <amaliu...@apache.org> wrote:

> I feel like the mapPartitions can be implemented as a SELECT + GROUP BY,
> where GROUP BY is to partition the data, then per partition computation is
> handled by the SELECT.
>
> -Rui
>
> On Thu, Mar 4, 2021 at 11:56 AM Debajyoti Roy <newroy...@gmail.com> wrote:
>
> > Thanks again Julian.
> >
> > Since, mapPartitions is really a specialized map would it be best to
> model
> > it as a SELECT (similar to functions inside an expression) ?
> > Barring cases where h > h' and mapPartitions acts like a filter.
> >
> > On Thu, Mar 4, 2021 at 11:41 AM Julian Hyde <jhyde.apa...@gmail.com>
> > wrote:
> >
> > > SQL has equivalents of many functional programming idioms:
> > >
> > >   * map is SELECT
> > >   * filter is WHERE
> > >   * flatMap is similar to CROSS APPLY
> > >
> > > That said, SQL’s strength is that the operations are not optimized for
> > any
> > > particular physical organization of data (e.g. working on sorted or
> > > partitioned data). mapPartitions is in this category. Of course a
> > physical
> > > implementation of one of SQL’s logical operators might use
> mapPartitions.
> > >
> > > Julian
> > >
> > >
> > >
> > >
> > >
> > > > On Mar 4, 2021, at 10:44 AM, Debajyoti Roy <newroy...@gmail.com>
> > wrote:
> > > >
> > > > Thanks for the responses, adding some more color below.
> > > >
> > > > Spark's API adopted concepts from the functional programming paradigm
> > > (map,
> > > > filter, flatmap,...) into data processing. Spark did add several
> > > relational
> > > > operators like join, union, select, etc. However, there are certain
> > APIs
> > > > that are really hard to model in terms of standard relational
> > operators.
> > > > Let me take one example of mapPartitions.
> > > >
> > > > mapPartitions( T -> U ):
> > > > w columns and h rows can turn into totally different w' != w columns
> > and
> > > h'
> > > > != h rows. Since processing happens per partition, this API is a
> great
> > > > choice for vectorized heavyweight initialization cost operations e.g.
> > > batch
> > > > inferencing.
> > > >
> > > > In terms of relational models, mapPartitions can be modeled just
> like a
> > > > function inside an expression operator. However, there can be
> > interesting
> > > > cases e.g. h > h' and mapMartitions starts to feel like a filter. Can
> > > there
> > > > be other challenges and opportunities in terms of planner and
> optimizer
> > > > because mapPartitions is definitely NOT like any other function
> inside
> > an
> > > > expression as shown below:
> > > >
> > > > SELECT name, address, mapPartitions(id, tweet, '{threshold: 0.5}',
> > > > 'sentiment_analysis_4', 10000) FROM my_twitter_data...
> > > >
> > > > So what is a better usage example for mapPartitions expressed as SQL
> ?
> > I
> > > am
> > > > really struggling with that part and I agree with Julian that is the
> > key.
> > > >
> > > > Regards,
> > > > Debajyoti Roy
> > > >
> > > > On Thu, Mar 4, 2021 at 12:01 AM Julian Hyde <jhyde.apa...@gmail.com>
> > > wrote:
> > > >
> > > >> I searched for mapPartitions and flatMapGroupsWithState, and it
> looks
> > as
> > > >> if you are talking about Apache Spark operations. Can you give some
> > > >> examples of typical queries that would use these operations?
> > > >>
> > > >> It’s possible that these operations accomplish things that are not
> > > >> possible in the relational model; but I think it’s more likely that
> > > these
> > > >> are algorithms that can implement relational operations such as
> > windowed
> > > >> aggregate functions. If you give some examples, we can see whether
> > they
> > > can
> > > >> be expressed in SQL or relational algebra.
> > > >>
> > > >> Julian
> > > >>
> > > >>
> > > >>> On Mar 3, 2021, at 10:54 PM, Rui Wang <amaliu...@apache.org>
> wrote:
> > > >>>
> > > >>> Well I think the expected approach is to translate other operations
> > to
> > > >>> relational operators by yourself ;-)
> > > >>>
> > > >>> I think Calcite won't be the place to have extensions for such
> > > >> translation.
> > > >>> The main concern is that those non relational operations are
> > > >> "non-standard".
> > > >>>
> > > >>> -Rui
> > > >>>
> > > >>> On Wed, Mar 3, 2021 at 10:12 PM Debajyoti Roy <newroy...@gmail.com
> >
> > > >> wrote:
> > > >>>
> > > >>>> Hi All,
> > > >>>>
> > > >>>> For operators like filter, join, union, aggregate, window the
> > > >>>> logical RelNode choices are obvious within
> > > >> org.apache.calcite.rel.logical
> > > >>>> package.
> > > >>>>
> > > >>>> However, for more complex applications that require operations
> like
> > > >>>> mapPartitions, flatMapGroupsWithState, etc. what would be some
> > choices
> > > >> for
> > > >>>> logical rel node and relational expression classes in Apache
> Calcite
> > > >>>> (independent of engine)?
> > > >>>>
> > > >>>> What is a good way to model operators that are not traditionally
> > > >> relational
> > > >>>> and hence do not exist in Calcite (looking for hooks / extension
> > > points
> > > >> /
> > > >>>> roadmaps)?
> > > >>>>
> > > >>>> Thanks in advance for any pointers, (disclaimer: I am new to
> > Calcite)
> > > >>>> Debajyoti Roy
> > > >>>>
> > > >>
> > > >>
> > >
> > >
> >
>

Reply via email to