On Fri, May 27, 2016 at 8:20 PM, Benjamin Kim <bbuil...@gmail.com> wrote:

> Hi Mike,
>
> First of all, thanks for the link. It looks like an interesting read. I
> checked that Aerospike is currently at version 3.8.2.3, and in the article,
> they are evaluating version 3.5.4. The main thing that impressed me was
> their claim that they can beat Cassandra and HBase by 8x for writing and
> 25x for reading. Their big claim to fame is that Aerospike can write 1M
> records per second with only 50 nodes. I wanted to see if this is real.
>

1M records per second on 50 nodes is pretty doable by Kudu as well,
depending on the size of your records and the insertion order. I've been
playing with a ~70 node cluster recently and seen 1M+ writes/second
sustained, and bursting above 4M. These are 1KB rows with 11 columns, and
with pretty old HDD-only nodes. I think newer flash-based nodes could do
better.


>
> To answer your questions, we have a DMP with user profiles with many
> attributes. We create segmentation information off of these attributes to
> classify them. Then, we can target advertising appropriately for our sales
> department. Much of the data processing is for applying models on all or if
> not most of every profile’s attributes to find similarities (nearest
> neighbor/clustering) over a large number of rows when batch processing or a
> small subset of rows for quick online scoring. So, our use case is a
> typical advanced analytics scenario. We have tried HBase, but it doesn’t
> work well for these types of analytics.
>
> I read, that Aerospike in the release notes, they did do many improvements
> for batch and scan operations.
>
> I wonder what your thoughts are for using Kudu for this.
>

Sounds like a good Kudu use case to me. I've heard great things about
Aerospike for the low latency random access portion, but I've also heard
that it's _very_ expensive, and not particularly suited to the columnar
scan workload. Lastly, I think the Apache license of Kudu is much more
appealing than the AGPL3 used by Aerospike. But, that's not really a direct
answer to the performance question :)


>
> Thanks,
> Ben
>
>
> On May 27, 2016, at 6:21 PM, Mike Percy <mpe...@cloudera.com> wrote:
>
> Have you considered whether you have a scan heavy or a random access heavy
> workload? Have you considered whether you always access / update a whole
> row vs only a partial row? Kudu is a column store so has some
> awesome performance characteristics when you are doing a lot of scanning of
> just a couple of columns.
>
> I don't know the answer to your question but if your concern is
> performance then I would be interested in seeing comparisons from a perf
> perspective on certain workloads.
>
> Finally, a year ago Aerospike did quite poorly in a Jepsen test:
> https://aphyr.com/posts/324-jepsen-aerospike
>
> I wonder if they have addressed any of those issues.
>
> Mike
>
> On Friday, May 27, 2016, Benjamin Kim <bbuil...@gmail.com> wrote:
>
>> I am just curious. How will Kudu compare with Aerospike (
>> http://www.aerospike.com)? I went to a Spark Roadshow and found out
>> about this piece of software. It appears to fit our use case perfectly
>> since we are an ad-tech company trying to leverage our user profiles data.
>> Plus, it already has a Spark connector and has a SQL-like client. The
>> tables can be accessed using Spark SQL DataFrames and, also, made into SQL
>> tables for direct use with Spark SQL ODBC/JDBC Thriftserver. I see from the
>> work done here http://gerrit.cloudera.org:8080/#/c/2992/ that the Spark
>> integration is well underway and, from the looks of it lately, almost
>> complete. I would prefer to use Kudu since we are already a Cloudera shop,
>> and Kudu is easy to deploy and configure using Cloudera Manager. I also
>> hope that some of Aerospike’s speed optimization techniques can make it
>> into Kudu in the future, if they have not been already thought of or
>> included.
>>
>> Just some thoughts…
>>
>> Cheers,
>> Ben
>
>
>
> --
> --
> Mike Percy
> Software Engineer, Cloudera
>
>
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to