I follow now, thanks On Fri, Sep 9, 2016 at 12:48 PM Todd Lipcon <t...@cloudera.com> wrote:
> Agreed with what David wrote. The dropping of 'key' status in projections > is intentional. It's the same as you get in MySQL if you have something > like: > > CREATE TABLE t ( > x integer primary key, > y integer > ); > > CREATE TABLE t2 AS SELECT * FROM t; > > the 'select' results don't carry any notion of which columns were > originally keys. > > In Kudu it smells a little funny today because we have two restrictions > that MySQL doesn't have: > 1) our keys must be listed first in the schema > 2) all tables must have primary keys > > However, we're in the process of lifting the first restriction (hoping to > finish it in the coming weeks), and I wouldn't be surprised if at some > point we lifted the second as well (some tables work just fine as 'bags' > with no need for PK constraints or organization). > > -Todd > > > On Fri, Sep 9, 2016 at 9:01 AM, David Alves <davidral...@gmail.com> wrote: > > > Oh I see what you mean. > > It's not that the keys are getting dropped, its that they're not marked > as > > keys. > > This arguably makes sense on a projection: for instance you might want > the > > keys returns in the end of the projection, while table schemas (at least > > for now) require that they are present at the beginning of the > projection. > > If you really want to create a new table based on an existing one, you > > could get the schema from KuduTable. That one should be complete. > > > > -david > > > > On Fri, Sep 9, 2016 at 8:51 AM, Jordan Birdsell < > jordantbirds...@gmail.com > > > > > wrote: > > > > > Right, what i'm saying is, if i do include the key in my projection, > the > > > schema does not maintain it as a key. The issue isnt so much that i > cant > > > apply predicates to the key column, its that if i wanted to create a > > > projection and then want to use that projection to create a table based > > on > > > that projection, i'd have to rebuild the schema (i.e., the schema > > returned > > > is effectively useless for creating new tables). This pattern of > > creating > > > tables from projections is pretty common in dataframe like libraries in > > > python. > > > > > > gist of offending code with comment on issue: > > > https://gist.github.com/jtbirdsell/e376a7fa21f3b1893efa7e1ddac408d7 > > > > > > > > > On Fri, Sep 9, 2016 at 11:38 AM David Alves <davidral...@gmail.com> > > wrote: > > > > > > > Wait, If you _do_ set a projection on the scanner that does not > include > > > the > > > > keys, then they won't be returned (and won't appear on the > projection's > > > > schema). > > > > Note that this does not mean that you can't set predicates on the > key, > > > it's > > > > just that they'll be evaluated server side, but the key won't > actually > > be > > > > returned. > > > > Maybe I'm misunderstanding what you're saying? > > > > Care to post a gist with the offending code? > > > > > > > > -david > > > > > > > > > > > > > > > > On Fri, Sep 9, 2016 at 8:26 AM, Jordan Birdsell < > > > jordantbirds...@gmail.com > > > > > > > > > wrote: > > > > > > > > > Hey David, > > > > > > > > > > Yep, i'm sure, taking a look at the scan_configuration class, the > > issue > > > > > seems to be here: > > > > > > > > > > Status ScanConfiguration::SetProjectedColumnIndexes(const > > vector<int>& > > > > > col_indexes) { > > > > > .... > > > > > RETURN_NOT_OK*(s->Reset(cols, 0));* > > > > > .... > > > > > > > > > > In the SetProjectedColumnIndexes method (which is also used by > > > > > SetProjectedColumnNames), we're setting the schema without the > index. > > > > > > > > > > There are probably a couple of ways to address this: > > > > > > > > > > 1. Check if all key columns are in the projection, and if so, > > > maintain > > > > > the key. > > > > > 2. Provide an optional parameter to be able to set the key to > > users > > > > > preference for the new projection. This would be beneficial for > > > cases > > > > > where > > > > > the user may want to create a new table based on their > projection. > > > > > > > > > > Thoughts? > > > > > > > > > > Jordan > > > > > > > > > > On Fri, Sep 9, 2016 at 11:08 AM David Alves <davidral...@gmail.com > > > > > > wrote: > > > > > > > > > > > Hi Jordan > > > > > > > > > > > > KuduScanner::GetProjectionSchema returns the schema of the > > > projection > > > > > > that was previously set on the scanner. If you don't a projection > > it > > > > > should > > > > > > indeed return all the columns. > > > > > > Are you sure you didn't set a projection (with > > > > SetProjectedColumnNames > > > > > > or SetProjectedColumnIndexes) that excluded the key? > > > > > > > > > > > > Best > > > > > > David > > > > > > > > > > > > On Fri, Sep 9, 2016 at 5:16 AM, Jordan Birdsell < > > > > > jordantbirds...@gmail.com > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > Hey folks, > > > > > > > > > > > > > > i was doing some work on KUDU-854 > > > > > > > <https://issues.apache.org/jira/browse/KUDU-854> and when > > testing > > > > the > > > > > > > KuduScanner::GetProjectionSchema method call, found that the > key > > > was > > > > > > being > > > > > > > dropped, which makes this much more challenging to test. Any > > ideas > > > if > > > > > it > > > > > > is > > > > > > > intended to drop the key information in a scanner projection? I > > > would > > > > > > > imagine this could prevent functionality like creating new > tables > > > > from > > > > > a > > > > > > > projection. > > > > > > > > > > > > > > Thanks, > > > > > > > Jordan > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > Todd Lipcon > Software Engineer, Cloudera >