I follow now, thanks

On Fri, Sep 9, 2016 at 12:48 PM Todd Lipcon <t...@cloudera.com> wrote:

> Agreed with what David wrote. The dropping of 'key' status in projections
> is intentional. It's the same as you get in MySQL if you have something
> like:
>
> CREATE TABLE t (
>   x integer primary key,
>   y integer
> );
>
> CREATE TABLE t2 AS SELECT * FROM t;
>
> the 'select' results don't carry any notion of which columns were
> originally keys.
>
> In Kudu it smells a little funny today because we have two restrictions
> that MySQL doesn't have:
> 1) our keys must be listed first in the schema
> 2) all tables must have primary keys
>
> However, we're in the process of lifting the first restriction (hoping to
> finish it in the coming weeks), and I wouldn't be surprised if at some
> point we lifted the second as well (some tables work just fine as 'bags'
> with no need for PK constraints or organization).
>
> -Todd
>
>
> On Fri, Sep 9, 2016 at 9:01 AM, David Alves <davidral...@gmail.com> wrote:
>
> > Oh I see what you mean.
> > It's not that the keys are getting dropped, its that they're not marked
> as
> > keys.
> > This arguably makes sense on a projection: for instance you might want
> the
> > keys returns in the end of the projection, while table schemas (at least
> > for now) require that they are present at the beginning of the
> projection.
> > If you really want to create a new table based on an existing one, you
> > could get the schema from KuduTable. That one should be complete.
> >
> > -david
> >
> > On Fri, Sep 9, 2016 at 8:51 AM, Jordan Birdsell <
> jordantbirds...@gmail.com
> > >
> > wrote:
> >
> > > Right, what i'm saying is, if i do include the key in my projection,
> the
> > > schema does not maintain it as a key.  The issue isnt so much that i
> cant
> > > apply predicates to the key column, its that if i wanted to create a
> > > projection and then want to use that projection to create a table based
> > on
> > > that projection, i'd have to rebuild the schema (i.e., the schema
> > returned
> > > is effectively useless for creating new tables).  This pattern of
> > creating
> > > tables from projections is pretty common in dataframe like libraries in
> > > python.
> > >
> > > gist of offending code with comment on issue:
> > > https://gist.github.com/jtbirdsell/e376a7fa21f3b1893efa7e1ddac408d7
> > >
> > >
> > > On Fri, Sep 9, 2016 at 11:38 AM David Alves <davidral...@gmail.com>
> > wrote:
> > >
> > > > Wait, If you _do_ set a projection on the scanner that does not
> include
> > > the
> > > > keys, then they won't be returned (and won't appear on the
> projection's
> > > > schema).
> > > > Note that this does not mean that you can't set predicates on the
> key,
> > > it's
> > > > just that they'll be evaluated server side, but the key won't
> actually
> > be
> > > > returned.
> > > > Maybe I'm misunderstanding what you're saying?
> > > > Care to post a gist with the offending code?
> > > >
> > > > -david
> > > >
> > > >
> > > >
> > > > On Fri, Sep 9, 2016 at 8:26 AM, Jordan Birdsell <
> > > jordantbirds...@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > Hey David,
> > > > >
> > > > > Yep, i'm sure, taking a look at the scan_configuration class, the
> > issue
> > > > > seems to be here:
> > > > >
> > > > > Status ScanConfiguration::SetProjectedColumnIndexes(const
> > vector<int>&
> > > > > col_indexes) {
> > > > > ....
> > > > >   RETURN_NOT_OK*(s->Reset(cols, 0));*
> > > > > ....
> > > > >
> > > > > In the SetProjectedColumnIndexes method (which is also used by
> > > > > SetProjectedColumnNames), we're setting the schema without the
> index.
> > > > >
> > > > > There are probably a couple of ways to address this:
> > > > >
> > > > >    1. Check if all key columns are in the projection, and if so,
> > > maintain
> > > > >    the key.
> > > > >    2. Provide an optional parameter to be able to set the key to
> > users
> > > > >    preference for the new projection. This would be beneficial for
> > > cases
> > > > > where
> > > > >    the user may want to create a new table based on their
> projection.
> > > > >
> > > > > Thoughts?
> > > > >
> > > > > Jordan
> > > > >
> > > > > On Fri, Sep 9, 2016 at 11:08 AM David Alves <davidral...@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Hi Jordan
> > > > > >
> > > > > >   KuduScanner::GetProjectionSchema returns the schema of the
> > > projection
> > > > > > that was previously set on the scanner. If you don't a projection
> > it
> > > > > should
> > > > > > indeed return all the columns.
> > > > > >   Are you sure you didn't set a projection (with
> > > > SetProjectedColumnNames
> > > > > > or SetProjectedColumnIndexes) that excluded the key?
> > > > > >
> > > > > > Best
> > > > > > David
> > > > > >
> > > > > > On Fri, Sep 9, 2016 at 5:16 AM, Jordan Birdsell <
> > > > > jordantbirds...@gmail.com
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hey folks,
> > > > > > >
> > > > > > > i was doing some work on KUDU-854
> > > > > > > <https://issues.apache.org/jira/browse/KUDU-854> and when
> > testing
> > > > the
> > > > > > > KuduScanner::GetProjectionSchema method call, found that the
> key
> > > was
> > > > > > being
> > > > > > > dropped, which makes this much more challenging to test. Any
> > ideas
> > > if
> > > > > it
> > > > > > is
> > > > > > > intended to drop the key information in a scanner projection? I
> > > would
> > > > > > > imagine this could prevent functionality like creating new
> tables
> > > > from
> > > > > a
> > > > > > > projection.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Jordan
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Reply via email to