Performance Question

2016-05-27 Thread Benjamin Kim
I am just curious. How will Kudu compare with Aerospike (http://www.aerospike.com)? I went to a Spark Roadshow and found out about this piece of software. It appears to fit our use case perfectly since we are an ad-tech company trying to leverage our user profiles data. Plus, it already has a

Re: Performance Question

2016-05-27 Thread Benjamin Kim
ey have addressed any of those issues. > > Mike > > On Friday, May 27, 2016, Benjamin Kim <bbuil...@gmail.com > <mailto:bbuil...@gmail.com>> wrote: > I am just curious. How will Kudu compare with Aerospike > (http://www.aerospike.com <http://www.aerospike.com/

Re: [ANNOUNCE] Apache Kudu (incubating) 0.9.0 released

2016-06-13 Thread Benjamin Kim
Hi J-D, I would like to get this started especially now that UPSERT and Spark SQL DataFrames support. But, how do I use Cloudera Manager to deploy it? Is there a parcel available yet? Is there a new CSD file to be downloaded? I currently have CM 5.7.0 installed. Thanks, Ben > On Jun 10,

Re: Performance Question

2016-06-15 Thread Benjamin Kim
nt to try a table with replication count 1 > > On Jun 15, 2016 5:26 PM, "Benjamin Kim" <bbuil...@gmail.com > <mailto:bbuil...@gmail.com>> wrote: > Hi Todd, > > I did a simple test of our ad events. We stream using Spark Streaming > directly into HBase,

Re: Performance Question

2016-06-15 Thread Benjamin Kim
really do some conclusive tests? I want to see if I can match your results on my 50 node cluster. Thanks, Ben > On May 30, 2016, at 10:33 AM, Todd Lipcon <t...@cloudera.com> wrote: > > On Sat, May 28, 2016 at 7:12 AM, Benjamin Kim <bbuil...@gmail.com > <mailto:bbui

Re: Performance Question

2016-06-15 Thread Benjamin Kim
part that scares most users is when it comes to joining this data with other dimension/3rd party events tables because of shear size of it. We do what most companies do, similar to what I saw in earlier presentations of Kudu. We dump data out of HBase into partitioned Parquet tables to make qu

Re: Spark on Kudu

2016-05-28 Thread Benjamin Kim
0/#/c/2992/5/docs/developing.adoc > <http://gerrit.cloudera.org:8080/#/c/2992/5/docs/developing.adoc> > > -Chris George > > > On 5/18/16, 9:45 AM, "Benjamin Kim" <bbuil...@gmail.com > <mailto:bbuil...@gmail.com>> wrote: > > Can someone

Re: Performance Question

2016-05-28 Thread Benjamin Kim
where support will be built in? Thanks, Ben > On May 27, 2016, at 9:19 PM, Todd Lipcon <t...@cloudera.com> wrote: > > On Fri, May 27, 2016 at 8:20 PM, Benjamin Kim <bbuil...@gmail.com > <mailto:bbuil...@gmail.com>> wrote: > Hi Mike, > > First of

Re: Spark on Kudu

2016-06-14 Thread Benjamin Kim
ould you try: > > import org.kududb.client._ > and try again? > > - Dan > > On Tue, Jun 14, 2016 at 4:01 PM, Benjamin Kim <bbuil...@gmail.com > <mailto:bbuil...@gmail.com>> wrote: > I encountered an error trying to create a table based on the documentation > fr

Re: Spark on Kudu

2016-06-14 Thread Benjamin Kim
found: key not found (error 0)Not found: key not found (error 0)Not found: key not found (error 0)Not found: key not found (error 0) Does the key field need to be first in the DataFrame? Thanks, Ben > On Jun 14, 2016, at 4:28 PM, Dan Burkert <d...@cloudera.com> wrote: > > &

Re: Spark on Kudu

2016-06-14 Thread Benjamin Kim
would happen if I “overwrite” existing data when the DataFrame has data in it that does not exist in the Kudu table? I need to evaluate the best way to simulate the UPSERT behavior in HBase because this is what our use case is. Thanks, Ben > On Jun 14, 2016, at 5:05 PM, Benjamin Kim <

Re: Spark on Kudu

2016-06-17 Thread Benjamin Kim
ite = truncate + insert. I think that may match the normal > spark semantics more closely. > > - Dan > > On Tue, Jun 14, 2016 at 6:00 PM, Benjamin Kim <bbuil...@gmail.com > <mailto:bbuil...@gmail.com>> wrote: > Dan, > > Thanks for the information. That w

Re: Spark on Kudu

2016-06-17 Thread Benjamin Kim
e semantics will be, but at least one of them >> will be upsert. These modes come from spark, and they were really designed >> for file-backed storage and not table storage. We may want to do append = >> upsert, and overwrite = truncate + insert. I think that may match

Re: Spark on Kudu

2016-06-20 Thread Benjamin Kim
psert. These modes come from spark, and they were really designed >> for file-backed storage and not table storage. We may want to do append = >> upsert, and overwrite = truncate + insert. I think that may match the >> normal spark semantics more closely. >> >> - Da

Re: Spark on Kudu

2016-06-15 Thread Benjamin Kim
, and they were really designed > for file-backed storage and not table storage. We may want to do append = > upsert, and overwrite = truncate + insert. I think that may match the normal > spark semantics more closely. > > - Dan > > On Tue, Jun 14, 2016 at 6:00 PM, Benjam

Re: Kudu Release

2016-02-23 Thread Benjamin Kim
KzHfL2xcmKTScU-rhLcQFSns1UVSbrXhw%40mail.gmail.com%3E> > > Thanks, > > J-D > > On Tue, Feb 23, 2016 at 8:23 AM, Benjamin Kim <bbuil...@gmail.com > <mailto:bbuil...@gmail.com>> wrote: > Any word as to the release roadmap? > > Thanks, > Ben >

Spark on Kudu

2016-02-24 Thread Benjamin Kim
I see this KUDU-1214 targeted for 0.8.0, but I see no progress on it. When this is complete, will this mean that Spark will be able to work with Kudu both programmatically and as a client via Spark SQL? Or is there more work that needs to be done

Re: Spark on Kudu

2016-02-24 Thread Benjamin Kim
UDU-1321 > <https://issues.cloudera.org/browse/KUDU-1321> > > It's a really simple wrapper, and yes you can use SparkSQL on Kudu, but it > will require a lot more work to make it fast/useful. > > Hope this helps, > > J-D > > On Wed, Feb 24, 2016 at 3:08

Re: Spark on Kudu

2016-03-01 Thread Benjamin Kim
Hi J-D, Quick question… Is there an ETA for KUDU-1214? I want to target a version of Kudu to begin real testing of Spark against it for our devs. At least, I can tell them what timeframe to anticipate. Just curious, Benjamin Kim Data Solutions Architect [a•mo•bee] (n.) the company defining

Re: Spark on Kudu

2016-04-12 Thread Benjamin Kim
hat's as fully featured as Impala's? Do they > care being able to insert into Kudu with SparkSQL or just being able to query > real fast? Anything more specific to Spark that I'm missing? > > FWIW the plan is to get to 1.0 in late Summer/early Fall. At Cloudera all our > resource

Re: Spark on Kudu

2016-04-10 Thread Benjamin Kim
coming from… Cheers, Ben > On Apr 10, 2016, at 11:08 AM, Jean-Daniel Cryans <jdcry...@apache.org> wrote: > > On Sun, Apr 10, 2016 at 12:30 AM, Benjamin Kim <bbuil...@gmail.com > <mailto:bbuil...@gmail.com>> wrote: > J-D, > > The main thing I hear that Cass

Re: Spark on Kudu

2016-04-13 Thread Benjamin Kim
implement similar functionality through the api. > -Chris > > On 4/12/16, 5:19 PM, "Benjamin Kim" <bbuil...@gmail.com > <mailto:bbuil...@gmail.com>> wrote: > > It would be nice to adhere to the SQL:2003 standard for an “upsert” if it > were to be

Sparse Data

2016-05-12 Thread Benjamin Kim
Can Kudu handle the use case where sparse data is involved? In many of our processes, we deal with data that can have any number of columns and many previously unknown column names depending on what attributes are brought in at the time. Currently, we use HBase to handle this. Since Kudu is

Re: Spark on Kudu

2016-05-18 Thread Benjamin Kim
port these type of statements but we may be able to > implement similar functionality through the api. > -Chris > > On 4/12/16, 5:19 PM, "Benjamin Kim" <bbuil...@gmail.com > <mailto:bbuil...@gmail.com>> wrote: > > It would be nice to adhere to the SQL:200

Re: Performance Question

2016-07-18 Thread Benjamin Kim
(unknown) @ 0x344d41ed5d (unknown) @ 0x7811d1 (unknown) Does anyone know what this means? Thanks, Ben > On Jul 11, 2016, at 10:47 AM, Todd Lipcon <t...@cloudera.com> wrote: > > On Mon, Jul 11, 2016 at 10:40 AM, Benjamin Kim <bbuil...@gmail.com > <mailto:bbuil...

Re: Performance Question

2016-07-18 Thread Benjamin Kim
t the server. It will > recreate a new repaired replica automatically. > > -Todd > > On Mon, Jul 18, 2016 at 10:28 AM, Benjamin Kim <bbuil...@gmail.com > <mailto:bbuil...@gmail.com>> wrote: > During my re-population of the Kudu table, I am getting this erro

Re: Performance Question

2016-07-18 Thread Benjamin Kim
<t...@cloudera.com> wrote: > > On Mon, Jul 18, 2016 at 10:31 AM, Benjamin Kim <bbuil...@gmail.com > <mailto:bbuil...@gmail.com>> wrote: > Todd, > > Thanks for the info. I was going to upgrade after the testing, but now, it > looks like I will have to do it earlier

Re: Performance Question

2016-07-11 Thread Benjamin Kim
Todd, It’s no problem to start over again. But, a tool like that would be helpful. Gaps in data can be accommodated for by just back filling. Thanks, Ben > On Jul 11, 2016, at 10:47 AM, Todd Lipcon <t...@cloudera.com> wrote: > > On Mon, Jul 11, 2016 at 10:40 AM, Benj

Re: Performance Question

2016-07-11 Thread Benjamin Kim
Todd > > On Mon, Jul 11, 2016 at 10:35 AM, Benjamin Kim <b...@amobee.com > <mailto:b...@amobee.com>> wrote: > Over the weekend, a tablet server went down. It’s not coming back up. So, I > decommissioned it and removed it from the cluster. Then, I restarted Kudu > be

Re: Performance Question

2016-06-28 Thread Benjamin Kim
guide > <http://getkudu.io/docs/schema_design.html#data-distribution>. We generally > recommend sticking to hash partitioning if possible, since you don't have to > determine your own split rows. > > - Dan > > On Wed, Jun 15, 2016 at 9:17 AM, Benjamin Kim <bbuil.

Re: Performance Question

2016-07-08 Thread Benjamin Kim
in production”, as management tends to say. Cheers, Ben > On Jul 6, 2016, at 9:46 AM, Dan Burkert <d...@cloudera.com> wrote: > > > > On Wed, Jul 6, 2016 at 7:05 AM, Benjamin Kim <bbuil...@gmail.com > <mailto:bbuil...@gmail.com>> wrote: > Over the weekend, th

Re: Performance Question

2016-07-06 Thread Benjamin Kim
ata frame or SQL, etc? Maybe you can share the schema and > some queries > > Todd > > Todd > > On Jun 15, 2016 8:10 AM, "Benjamin Kim" <bbuil...@gmail.com > <mailto:bbuil...@gmail.com>> wrote: > Hi Todd, > > Now that Kudu 0.9.0 is out

Re: Performance Question

2016-06-30 Thread Benjamin Kim
or SQL, etc? Maybe you can share the schema and > some queries > > Todd > > Todd > > On Jun 15, 2016 8:10 AM, "Benjamin Kim" <bbuil...@gmail.com > <mailto:bbuil...@gmail.com>> wrote: > Hi Todd, > > Now that Kudu 0.9.0 is out. I h

Re: Performance Question

2016-06-29 Thread Benjamin Kim
seconds you're seeing is constant overhead from Spark job setup, etc, given > that the performance doesn't seem to get slower as you went from 700K rows to > 13M rows. > > -Todd > > On Tue, Jun 28, 2016 at 3:03 PM, Benjamin Kim <bbuil...@gmail.com > <mailto:b

Re: Performance Question

2016-06-29 Thread Benjamin Kim
, Todd Lipcon <t...@cloudera.com> wrote: > > On Wed, Jun 29, 2016 at 11:32 AM, Benjamin Kim <bbuil...@gmail.com > <mailto:bbuil...@gmail.com>> wrote: > Todd, > > I started Spark streaming more events into Kudu. Performance is great there > too! With HBase