Re: Performance Question

2016-07-06 Thread Dan Burkert
On Wed, Jul 6, 2016 at 7:05 AM, Benjamin Kim wrote: > Over the weekend, the row count is up to <500M. I will give it another few > days to get to 1B rows. I still get consistent times ~15s for doing row > counts despite the amount of data growing. > > On another note, I got a

Re: Performance Question

2016-07-06 Thread Dan Burkert
On Mon, Jul 4, 2016 at 2:46 AM, 袁康(梓悠) wrote: > How can I delete data in kudu table wiht spark (not delete the table at > all)? > We do not currently have a way to delete a Kudu table through the spark connector, but you should be able to instantiate a Kudu client

Re: Spark on Kudu

2016-06-17 Thread Dan Burkert
ere is no plan to have auto increment in Kudu. Distributed, consistent, auto incrementing counters is a difficult problem, and I don't think there are any known solutions that would be fast enough for Kudu (happy to be proven wrong, though!). - Dan > > Thanks, > Ben > > On Jun

Re: Performance Question

2016-06-15 Thread Dan Burkert
Adding partition splits when range partitioning is done via the CreateTableOptions.addSplitRow method. You can find more about the different partitioning options in the schema design

Re: Spark on Kudu

2016-06-14 Thread Dan Burkert
gt; Ben > > On Jun 14, 2016, at 5:57 PM, Dan Burkert <d...@cloudera.com> wrote: > > Right now append uses an update Kudu operation, which requires the row > already be present in the table. Overwrite maps to insert. Kudu very > recently got upsert support baked in, but it hasn't

Re: Spark on Kudu

2016-06-14 Thread Dan Burkert
und (error 0)Not found: key not > found (error 0)Not found: key not found (error 0)Not found: key not found > (error 0)Not found: key not found (error 0) > > Does the key field need to be first in the DataFrame? > > Thanks, > Ben > > On Jun 14, 2016, at 4:28 PM, Dan Burkert <

Re: Spark on Kudu

2016-06-14 Thread Dan Burkert
in this case "my_id" is the one and only valid combination). However, the call to `addHashPartition` also takes the number of buckets as the second param. You shouldn't get the IllegalArgumentException as long as you are specifying either `addHashPartitions` or `setRangePartitionColumns`. -

Re: Spark on Kudu

2016-06-14 Thread Dan Burkert
Looks like we're missing an import statement in that example. Could you try: import org.kududb.client._ and try again? - Dan On Tue, Jun 14, 2016 at 4:01 PM, Benjamin Kim wrote: > I encountered an error trying to create a table based on the documentation > from a

Re: Proposal: remove default partitioning for new tables

2016-05-26 Thread Dan Burkert
ing with Kudu and the "default" behavior has already tripped me up a > > couple times. > > > > Thanks, > > > > Abhi > > > > On Thu, May 19, 2016 at 4:03 PM, Dan Burkert <danburk...@apache.org> > > wrote: > > > >> Hi a

Re: Partition and Split rows

2016-05-16 Thread Dan Burkert
isplayed in log messages and on the web UI. - Dan > On Sat, May 7, 2016 at 9:20 PM, Dan Burkert <d...@cloudera.com> wrote: > >> Hi Sand, >> >> I've been working on some diagrams to help explain some of the more >> advanced partitioning types, it's attached. S

Re: Sparse Data

2016-05-12 Thread Dan Burkert
Hi Ben, Kudu doesn't support sparse datasets with many columns very well. Kudu's data model looks much more like the relational, structured data model of a traditional SQL database than HBase's data model. Kudu doesn't yet have a map column type (or any nested column types), but we do have

Re: Partition and Split rows

2016-05-12 Thread Dan Burkert
nt strategy in Kudu, since each tablet server should only have on the order of 10-20 tablets. Instead, take advantage of the index capability of Primary Keys. - Dan > On Thu, May 12, 2016 at 11:13 AM, Dan Burkert <d...@cloudera.com> wrote: > >> Forgot to add the PK speci

Re: Partition and Split rows

2016-05-12 Thread Dan Burkert
Forgot to add the PK specification to the CREATE TABLE, it should have read as follows: CREATE TABLE metrics (metric STRING, time TIMESTAMP, value DOUBLE) PRIMARY KEY (metric, time); - Dan On Thu, May 12, 2016 at 11:12 AM, Dan Burkert <d...@cloudera.com> wrote: > > On Thu, May 12

Re: best practices to remove/retire data

2016-05-12 Thread Dan Burkert
On Thu, May 12, 2016 at 8:32 AM, Chris George wrote: > How hard would a predicate based delete be? > Ie ScanDelete or something. > -Chris George > That might be pretty difficult, since it implicitly assumes cross row transactional consistency. If consistency isn't

Re: Please welcome Binglin Chang as a Kudu committer and PPMC member

2016-04-05 Thread Dan Burkert
Welcome, Binglin! - Dan On Mon, Apr 4, 2016 at 9:28 PM, Mike Percy wrote: > Welcome aboard, Binglin! Looking forward to your continued contributions to > the project! > > Best, > Mike > > On Mon, Apr 4, 2016 at 9:11 PM, Todd Lipcon wrote: > > > Hi Kudu

Re: sparkContext won't stop when using spark-kudu

2016-03-19 Thread Dan Burkert
Hi Darren, I found the culprit, and I've put up a patch here <http://gerrit.cloudera.org:8080/#/c/2571/>. Should make it into the next release (0.8.0). Until then stopping the shell with the 'exit' command or -C should do the trick. - Dan On Tue, Mar 15, 2016 at 12:04 PM, Dan Burk