Hi Dan, Thanks for the feedback. I'm thinking that the PK could be specified by passing in a tuple of column indices to use for the key (assuming that composite keys are supported?).
Maybe it would be useful for KuduContext to have createTable(name: String, schema: StructType) and dropTable(name: String) methods that can eventually be wrapped in Spark SQL? I'm just getting up to speed with Kudu by trying to run TPC-H via Spark and the use case for driving the current code was being able to take existing DataFrames (created by reading CSV files) and save them to Kudu so that we can then start testing the TPC-H SQL queries. We will be experimenting with both Spark SQL and own own SQL parser / query planner for Spark, that we are planning on open sourcing too. Thanks, Andy. On Thu, May 5, 2016 at 7:16 PM, Dan Burkert <[email protected]> wrote: > Hey Andy, > > Thanks for the patch! I left you some specific feedback in on the gerrit > review, but I want to discuss the high level approach a bit. I think the > patch as it's written now is going to have limited use, because it doesn't > allow for specifying primary keys or partitioning, which are critical for > correctness and performance. In the long run we will definitely want to be > able to create tables through Spark SQL, but perhaps we should start of > with just inserting/updating rows in existing tables. It would be > interesting to see how other databases solved this problem, since I'm sure > we're not the only ones with configuration options on table create. The > relational databases in particular must have PK options. > > - Dan > > On Thu, May 5, 2016 at 5:51 PM, Andy Grove <[email protected]> wrote: > > > Hi, > > > > I'm working with some colleagues at AgilData on Spark/Kudu integration > and > > we expect to be able to contribute a number of features to the code base. > > > > To kick things off, here is a gerrit for discussion that adds support for > > persisting a DataFrame to a Kudu table. It would be great to hear > feedback > > and feature requests for this capability. > > > > http://gerrit.cloudera.org:8080/#/c/2969/ > > > > Thanks, > > > > Andy. > > >
