Re: Hadoop vs Cassandra

2016-10-24 Thread Stefania Alborghetti
If you intend to use files on HDFS, I would recommend using Parquet files.
It's a very fast columnar format that allows querying data very
efficiently. I believe a Spark data frame will take care of saving all the
columns in a Parquet file. So you could extract the data from Cassandra via
the Spark connector and save it to Parquet.

Or you can query Cassandra data directly from Spark, but it won't be as
fast as Parquet.

It's a trade-off between how much data to save to Parquet, how often, how
many queries, what format and whether you can tolerate some stale data.


On Sun, Oct 23, 2016 at 7:18 PM, Welly Tambunan  wrote:

> Another thing is,
>
> Let's say that we already have a structure data, the way we load that to
> HDFS is to turn that one into a files ?
>
> Cheers
>
> On Sun, Oct 23, 2016 at 6:18 PM, Welly Tambunan  wrote:
>
>> So basically you will store that files to HDFS and use Spark to process
>> it ?
>>
>> On Sun, Oct 23, 2016 at 6:03 PM, Joaquin Alzola <
>> joaquin.alz...@lebara.com> wrote:
>>
>>>
>>>
>>> I think what Ali mentions is correct:
>>>
>>> If you need a lot of queries that require joins, or complex analytics of
>>> the kind that Cassandra isn't suited for, then HDFS / HBase may be better.
>>>
>>>
>>>
>>> We have files in which one line contains 500 fields (separated by pipe)
>>> and each of this fields is particularly important.
>>>
>>> Cassandra will not manage that since you will need 500 indexes. HDFS is
>>> the proper way.
>>>
>>>
>>>
>>>
>>>
>>> *From:* Welly Tambunan [mailto:if05...@gmail.com]
>>> *Sent:* 23 October 2016 10:19
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Re: Hadoop vs Cassandra
>>>
>>>
>>>
>>> I like muti data centre resillience in cassandra.
>>>
>>> I think thats plus one for cassandra.
>>>
>>> Ali, complex analytics can be done in spark right?
>>>
>>> On 23 Oct 2016 4:08 p.m., "Ali Akhtar"  wrote:
>>>
>>> >
>>>
>>> > I would say it depends on your use case.
>>> >
>>> > If you need a lot of queries that require joins, or complex analytics
>>> of the kind that Cassandra isn't suited for, then HDFS / HBase may be
>>> better.
>>> >
>>> > If you can work with the cassandra way of doing things (creating new
>>> tables for each query you'll need to do, duplicating data - doing extra
>>> writes for faster reads) , then Cassandra should work for you. It is easier
>>> to setup and do dev ops with, in my experience.
>>> >
>>> > On Sun, Oct 23, 2016 at 2:05 PM, Welly Tambunan 
>>> wrote:
>>>
>>> >>
>>>
>>> >> I mean. HDFS and HBase.
>>> >>
>>> >> On Sun, Oct 23, 2016 at 4:00 PM, Ali Akhtar 
>>> wrote:
>>>
>>> >>>
>>>
>>> >>> By Hadoop do you mean HDFS?
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Sun, Oct 23, 2016 at 1:56 PM, Welly Tambunan 
>>> wrote:
>>>
>>> >>>>
>>>
>>> >>>> Hi All,
>>> >>>>
>>> >>>> I read the following comparison between hadoop and cassandra. Seems
>>> the conclusion that we use hadoop for data lake ( cold data ) and Cassandra
>>> for hot data (real time data).
>>> >>>>
>>> >>>> http://www.datastax.com/nosql-databases/nosql-cassandra-and-hadoop
>>> <http://www.datastax.com/nosql-databases/nosql-cassandra-and-hadoop>
>>> >>>>
>>> >>>> My question is, can we just use cassandra to rule them all ?
>>> >>>>
>>> >>>> What we are trying to achieve is to minimize the moving part on our
>>> system.
>>> >>>>
>>> >>>> Any response would be really appreciated.
>>> >>>>
>>> >>>>
>>> >>>> Cheers
>>> >>>>
>>> >>>> --
>>> >>>> Welly Tambunan
>>> >>>> Triplelands
>>> >>>>
>>> >>>> http://weltam.wordpress.com <http://weltam.wordpress.com>
>>> >>>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>> >>>
>>> >>>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Welly Tambunan
>>> >> Triplelands
>>> >>
>>> >> http://weltam.wordpress.com <http://weltam.wordpress.com>
>>> >> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>> >
>>> >
>>> This email is confidential and may be subject to privilege. If you are
>>> not the intended recipient, please do not copy or disclose its content but
>>> contact the sender immediately upon receipt.
>>>
>>
>>
>>
>> --
>> Welly Tambunan
>> Triplelands
>>
>> http://weltam.wordpress.com
>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>
>
>
>
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com <http://www.triplelands.com/blog/>
>



-- 


Stefania Alborghetti

|+852 6114 9265| stefania.alborghe...@datastax.com


Re: Hadoop vs Cassandra

2016-10-23 Thread Welly Tambunan
Another thing is,

Let's say that we already have a structure data, the way we load that to
HDFS is to turn that one into a files ?

Cheers

On Sun, Oct 23, 2016 at 6:18 PM, Welly Tambunan  wrote:

> So basically you will store that files to HDFS and use Spark to process it
> ?
>
> On Sun, Oct 23, 2016 at 6:03 PM, Joaquin Alzola  > wrote:
>
>>
>>
>> I think what Ali mentions is correct:
>>
>> If you need a lot of queries that require joins, or complex analytics of
>> the kind that Cassandra isn't suited for, then HDFS / HBase may be better.
>>
>>
>>
>> We have files in which one line contains 500 fields (separated by pipe)
>> and each of this fields is particularly important.
>>
>> Cassandra will not manage that since you will need 500 indexes. HDFS is
>> the proper way.
>>
>>
>>
>>
>>
>> *From:* Welly Tambunan [mailto:if05...@gmail.com]
>> *Sent:* 23 October 2016 10:19
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: Hadoop vs Cassandra
>>
>>
>>
>> I like muti data centre resillience in cassandra.
>>
>> I think thats plus one for cassandra.
>>
>> Ali, complex analytics can be done in spark right?
>>
>> On 23 Oct 2016 4:08 p.m., "Ali Akhtar"  wrote:
>>
>> >
>>
>> > I would say it depends on your use case.
>> >
>> > If you need a lot of queries that require joins, or complex analytics
>> of the kind that Cassandra isn't suited for, then HDFS / HBase may be
>> better.
>> >
>> > If you can work with the cassandra way of doing things (creating new
>> tables for each query you'll need to do, duplicating data - doing extra
>> writes for faster reads) , then Cassandra should work for you. It is easier
>> to setup and do dev ops with, in my experience.
>> >
>> > On Sun, Oct 23, 2016 at 2:05 PM, Welly Tambunan 
>> wrote:
>>
>> >>
>>
>> >> I mean. HDFS and HBase.
>> >>
>> >> On Sun, Oct 23, 2016 at 4:00 PM, Ali Akhtar 
>> wrote:
>>
>> >>>
>>
>> >>> By Hadoop do you mean HDFS?
>> >>>
>> >>>
>> >>>
>> >>> On Sun, Oct 23, 2016 at 1:56 PM, Welly Tambunan 
>> wrote:
>>
>> >>>>
>>
>> >>>> Hi All,
>> >>>>
>> >>>> I read the following comparison between hadoop and cassandra. Seems
>> the conclusion that we use hadoop for data lake ( cold data ) and Cassandra
>> for hot data (real time data).
>> >>>>
>> >>>> http://www.datastax.com/nosql-databases/nosql-cassandra-and-hadoop
>> <http://www.datastax.com/nosql-databases/nosql-cassandra-and-hadoop>
>> >>>>
>> >>>> My question is, can we just use cassandra to rule them all ?
>> >>>>
>> >>>> What we are trying to achieve is to minimize the moving part on our
>> system.
>> >>>>
>> >>>> Any response would be really appreciated.
>> >>>>
>> >>>>
>> >>>> Cheers
>> >>>>
>> >>>> --
>> >>>> Welly Tambunan
>> >>>> Triplelands
>> >>>>
>> >>>> http://weltam.wordpress.com <http://weltam.wordpress.com>
>> >>>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>> >>>
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Welly Tambunan
>> >> Triplelands
>> >>
>> >> http://weltam.wordpress.com <http://weltam.wordpress.com>
>> >> http://www.triplelands.com <http://www.triplelands.com/blog/>
>> >
>> >
>> This email is confidential and may be subject to privilege. If you are
>> not the intended recipient, please do not copy or disclose its content but
>> contact the sender immediately upon receipt.
>>
>
>
>
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com <http://www.triplelands.com/blog/>
>



-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Re: Hadoop vs Cassandra

2016-10-23 Thread Welly Tambunan
So basically you will store that files to HDFS and use Spark to process it
?

On Sun, Oct 23, 2016 at 6:03 PM, Joaquin Alzola 
wrote:

>
>
> I think what Ali mentions is correct:
>
> If you need a lot of queries that require joins, or complex analytics of
> the kind that Cassandra isn't suited for, then HDFS / HBase may be better.
>
>
>
> We have files in which one line contains 500 fields (separated by pipe)
> and each of this fields is particularly important.
>
> Cassandra will not manage that since you will need 500 indexes. HDFS is
> the proper way.
>
>
>
>
>
> *From:* Welly Tambunan [mailto:if05...@gmail.com]
> *Sent:* 23 October 2016 10:19
> *To:* user@cassandra.apache.org
> *Subject:* Re: Hadoop vs Cassandra
>
>
>
> I like muti data centre resillience in cassandra.
>
> I think thats plus one for cassandra.
>
> Ali, complex analytics can be done in spark right?
>
> On 23 Oct 2016 4:08 p.m., "Ali Akhtar"  wrote:
>
> >
>
> > I would say it depends on your use case.
> >
> > If you need a lot of queries that require joins, or complex analytics of
> the kind that Cassandra isn't suited for, then HDFS / HBase may be better.
> >
> > If you can work with the cassandra way of doing things (creating new
> tables for each query you'll need to do, duplicating data - doing extra
> writes for faster reads) , then Cassandra should work for you. It is easier
> to setup and do dev ops with, in my experience.
> >
> > On Sun, Oct 23, 2016 at 2:05 PM, Welly Tambunan 
> wrote:
>
> >>
>
> >> I mean. HDFS and HBase.
> >>
> >> On Sun, Oct 23, 2016 at 4:00 PM, Ali Akhtar 
> wrote:
>
> >>>
>
> >>> By Hadoop do you mean HDFS?
> >>>
> >>>
> >>>
> >>> On Sun, Oct 23, 2016 at 1:56 PM, Welly Tambunan 
> wrote:
>
> >>>>
>
> >>>> Hi All,
> >>>>
> >>>> I read the following comparison between hadoop and cassandra. Seems
> the conclusion that we use hadoop for data lake ( cold data ) and Cassandra
> for hot data (real time data).
> >>>>
> >>>> http://www.datastax.com/nosql-databases/nosql-cassandra-and-hadoop
> <http://www.datastax.com/nosql-databases/nosql-cassandra-and-hadoop>
> >>>>
> >>>> My question is, can we just use cassandra to rule them all ?
> >>>>
> >>>> What we are trying to achieve is to minimize the moving part on our
> system.
> >>>>
> >>>> Any response would be really appreciated.
> >>>>
> >>>>
> >>>> Cheers
> >>>>
> >>>> --
> >>>> Welly Tambunan
> >>>> Triplelands
> >>>>
> >>>> http://weltam.wordpress.com <http://weltam.wordpress.com>
> >>>> http://www.triplelands.com <http://www.triplelands.com/blog/>
> >>>
> >>>
> >>
> >>
> >>
> >> --
> >> Welly Tambunan
> >> Triplelands
> >>
> >> http://weltam.wordpress.com <http://weltam.wordpress.com>
> >> http://www.triplelands.com <http://www.triplelands.com/blog/>
> >
> >
> This email is confidential and may be subject to privilege. If you are not
> the intended recipient, please do not copy or disclose its content but
> contact the sender immediately upon receipt.
>



-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


RE: Hadoop vs Cassandra

2016-10-23 Thread Joaquin Alzola

I think what Ali mentions is correct:
If you need a lot of queries that require joins, or complex analytics of the 
kind that Cassandra isn't suited for, then HDFS / HBase may be better.

We have files in which one line contains 500 fields (separated by pipe) and 
each of this fields is particularly important.
Cassandra will not manage that since you will need 500 indexes. HDFS is the 
proper way.


From: Welly Tambunan [mailto:if05...@gmail.com]
Sent: 23 October 2016 10:19
To: user@cassandra.apache.org
Subject: Re: Hadoop vs Cassandra


I like muti data centre resillience in cassandra.

I think thats plus one for cassandra.

Ali, complex analytics can be done in spark right?

On 23 Oct 2016 4:08 p.m., "Ali Akhtar" 
mailto:ali.rac...@gmail.com>> wrote:

>

> I would say it depends on your use case.
>
> If you need a lot of queries that require joins, or complex analytics of the 
> kind that Cassandra isn't suited for, then HDFS / HBase may be better.
>
> If you can work with the cassandra way of doing things (creating new tables 
> for each query you'll need to do, duplicating data - doing extra writes for 
> faster reads) , then Cassandra should work for you. It is easier to setup and 
> do dev ops with, in my experience.
>
> On Sun, Oct 23, 2016 at 2:05 PM, Welly Tambunan 
> mailto:if05...@gmail.com>> wrote:

>>

>> I mean. HDFS and HBase.
>>
>> On Sun, Oct 23, 2016 at 4:00 PM, Ali Akhtar 
>> mailto:ali.rac...@gmail.com>> wrote:

>>>

>>> By Hadoop do you mean HDFS?
>>>
>>>
>>>
>>> On Sun, Oct 23, 2016 at 1:56 PM, Welly Tambunan 
>>> mailto:if05...@gmail.com>> wrote:

>>>>

>>>> Hi All,
>>>>
>>>> I read the following comparison between hadoop and cassandra. Seems the 
>>>> conclusion that we use hadoop for data lake ( cold data ) and Cassandra 
>>>> for hot data (real time data).
>>>>
>>>> http://www.datastax.com/nosql-databases/nosql-cassandra-and-hadoop<http://www.datastax.com/nosql-databases/nosql-cassandra-and-hadoop>
>>>>
>>>> My question is, can we just use cassandra to rule them all ?
>>>>
>>>> What we are trying to achieve is to minimize the moving part on our system.
>>>>
>>>> Any response would be really appreciated.
>>>>
>>>>
>>>> Cheers
>>>>
>>>> --
>>>> Welly Tambunan
>>>> Triplelands
>>>>
>>>> http://weltam.wordpress.com<http://weltam.wordpress.com>
>>>> http://www.triplelands.com<http://www.triplelands.com/blog/>
>>>
>>>
>>
>>
>>
>> --
>> Welly Tambunan
>> Triplelands
>>
>> http://weltam.wordpress.com<http://weltam.wordpress.com>
>> http://www.triplelands.com<http://www.triplelands.com/blog/>
>
>

This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.


Re: Hadoop vs Cassandra

2016-10-23 Thread Ali Akhtar
"from a particular query" should be " from a particular country"

On Sun, Oct 23, 2016 at 2:36 PM, Ali Akhtar  wrote:

> They can be, but I would assume that if your Cassandra data model is
> inefficient for the kind of queries you want to do, Spark won't magically
> take that way.
>
> For example, say you have a users table. Each user has a country, which
> isn't a partitioning key or clustering key.
>
> If you wanted to calculate the number of all users from a particular
> query, there's no way to do that in the previous data model other than to
> do a full table scan and count the users from that country.
>
> Spark can do this full table scan for you and return the number of
> records. May be it can spread the work across multiple servers. But it
> can't reduce the amount of work that has to be done.
>
> Otoh, if you were okay with creating a new table in which the country is
> part of the primary key, and for each user that signed up, you created a
> record in this user_by_country table, then it would be a very fast query to
> look up the users in a particular country, as country is then the primary
> key.
>
>
>
> On Sun, Oct 23, 2016 at 2:18 PM, Welly Tambunan  wrote:
>
>> I like muti data centre resillience in cassandra.
>>
>> I think thats plus one for cassandra.
>>
>> Ali, complex analytics can be done in spark right?
>>
>> On 23 Oct 2016 4:08 p.m., "Ali Akhtar"  wrote:
>>
>> >
>>
>> > I would say it depends on your use case.
>> >
>> > If you need a lot of queries that require joins, or complex analytics
>> of the kind that Cassandra isn't suited for, then HDFS / HBase may be
>> better.
>> >
>> > If you can work with the cassandra way of doing things (creating new
>> tables for each query you'll need to do, duplicating data - doing extra
>> writes for faster reads) , then Cassandra should work for you. It is easier
>> to setup and do dev ops with, in my experience.
>> >
>> > On Sun, Oct 23, 2016 at 2:05 PM, Welly Tambunan 
>> wrote:
>>
>> >>
>>
>> >> I mean. HDFS and HBase.
>> >>
>> >> On Sun, Oct 23, 2016 at 4:00 PM, Ali Akhtar 
>> wrote:
>>
>> >>>
>>
>> >>> By Hadoop do you mean HDFS?
>> >>>
>> >>>
>> >>>
>> >>> On Sun, Oct 23, 2016 at 1:56 PM, Welly Tambunan 
>> wrote:
>>
>> 
>>
>>  Hi All,
>> 
>>  I read the following comparison between hadoop and cassandra. Seems
>> the conclusion that we use hadoop for data lake ( cold data ) and Cassandra
>> for hot data (real time data).
>> 
>>  http://www.datastax.com/nosql-databases/nosql-cassandra-and-hadoop
>> 
>> 
>>  My question is, can we just use cassandra to rule them all ?
>> 
>>  What we are trying to achieve is to minimize the moving part on our
>> system.
>> 
>>  Any response would be really appreciated.
>> 
>> 
>>  Cheers
>> 
>>  --
>>  Welly Tambunan
>>  Triplelands
>> 
>>  http://weltam.wordpress.com 
>>  http://www.triplelands.com 
>> >>>
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Welly Tambunan
>> >> Triplelands
>> >>
>> >> http://weltam.wordpress.com 
>> >> http://www.triplelands.com 
>> >
>> >
>>
>
>


Re: Hadoop vs Cassandra

2016-10-23 Thread Ali Akhtar
They can be, but I would assume that if your Cassandra data model is
inefficient for the kind of queries you want to do, Spark won't magically
take that way.

For example, say you have a users table. Each user has a country, which
isn't a partitioning key or clustering key.

If you wanted to calculate the number of all users from a particular query,
there's no way to do that in the previous data model other than to do a
full table scan and count the users from that country.

Spark can do this full table scan for you and return the number of records.
May be it can spread the work across multiple servers. But it can't reduce
the amount of work that has to be done.

Otoh, if you were okay with creating a new table in which the country is
part of the primary key, and for each user that signed up, you created a
record in this user_by_country table, then it would be a very fast query to
look up the users in a particular country, as country is then the primary
key.



On Sun, Oct 23, 2016 at 2:18 PM, Welly Tambunan  wrote:

> I like muti data centre resillience in cassandra.
>
> I think thats plus one for cassandra.
>
> Ali, complex analytics can be done in spark right?
>
> On 23 Oct 2016 4:08 p.m., "Ali Akhtar"  wrote:
>
> >
>
> > I would say it depends on your use case.
> >
> > If you need a lot of queries that require joins, or complex analytics of
> the kind that Cassandra isn't suited for, then HDFS / HBase may be better.
> >
> > If you can work with the cassandra way of doing things (creating new
> tables for each query you'll need to do, duplicating data - doing extra
> writes for faster reads) , then Cassandra should work for you. It is easier
> to setup and do dev ops with, in my experience.
> >
> > On Sun, Oct 23, 2016 at 2:05 PM, Welly Tambunan 
> wrote:
>
> >>
>
> >> I mean. HDFS and HBase.
> >>
> >> On Sun, Oct 23, 2016 at 4:00 PM, Ali Akhtar 
> wrote:
>
> >>>
>
> >>> By Hadoop do you mean HDFS?
> >>>
> >>>
> >>>
> >>> On Sun, Oct 23, 2016 at 1:56 PM, Welly Tambunan 
> wrote:
>
> 
>
>  Hi All,
> 
>  I read the following comparison between hadoop and cassandra. Seems
> the conclusion that we use hadoop for data lake ( cold data ) and Cassandra
> for hot data (real time data).
> 
>  http://www.datastax.com/nosql-databases/nosql-cassandra-and-hadoop
> 
> 
>  My question is, can we just use cassandra to rule them all ?
> 
>  What we are trying to achieve is to minimize the moving part on our
> system.
> 
>  Any response would be really appreciated.
> 
> 
>  Cheers
> 
>  --
>  Welly Tambunan
>  Triplelands
> 
>  http://weltam.wordpress.com 
>  http://www.triplelands.com 
> >>>
> >>>
> >>
> >>
> >>
> >> --
> >> Welly Tambunan
> >> Triplelands
> >>
> >> http://weltam.wordpress.com 
> >> http://www.triplelands.com 
> >
> >
>


Re: Hadoop vs Cassandra

2016-10-23 Thread Welly Tambunan
I like muti data centre resillience in cassandra.

I think thats plus one for cassandra.

Ali, complex analytics can be done in spark right?

On 23 Oct 2016 4:08 p.m., "Ali Akhtar"  wrote:

>

> I would say it depends on your use case.
>
> If you need a lot of queries that require joins, or complex analytics of
the kind that Cassandra isn't suited for, then HDFS / HBase may be better.
>
> If you can work with the cassandra way of doing things (creating new
tables for each query you'll need to do, duplicating data - doing extra
writes for faster reads) , then Cassandra should work for you. It is easier
to setup and do dev ops with, in my experience.
>
> On Sun, Oct 23, 2016 at 2:05 PM, Welly Tambunan  wrote:

>>

>> I mean. HDFS and HBase.
>>
>> On Sun, Oct 23, 2016 at 4:00 PM, Ali Akhtar  wrote:

>>>

>>> By Hadoop do you mean HDFS?
>>>
>>>
>>>
>>> On Sun, Oct 23, 2016 at 1:56 PM, Welly Tambunan 
wrote:



 Hi All,

 I read the following comparison between hadoop and cassandra. Seems
the conclusion that we use hadoop for data lake ( cold data ) and Cassandra
for hot data (real time data).

 http://www.datastax.com/nosql-databases/nosql-cassandra-and-hadoop


 My question is, can we just use cassandra to rule them all ?

 What we are trying to achieve is to minimize the moving part on our
system.

 Any response would be really appreciated.


 Cheers

 --
 Welly Tambunan
 Triplelands

 http://weltam.wordpress.com 
 http://www.triplelands.com 
>>>
>>>
>>
>>
>>
>> --
>> Welly Tambunan
>> Triplelands
>>
>> http://weltam.wordpress.com 
>> http://www.triplelands.com 
>
>


Re: Hadoop vs Cassandra

2016-10-23 Thread Ali Akhtar
I would say it depends on your use case.

If you need a lot of queries that require joins, or complex analytics of
the kind that Cassandra isn't suited for, then HDFS / HBase may be better.

If you can work with the cassandra way of doing things (creating new tables
for each query you'll need to do, duplicating data - doing extra writes for
faster reads) , then Cassandra should work for you. It is easier to setup
and do dev ops with, in my experience.

On Sun, Oct 23, 2016 at 2:05 PM, Welly Tambunan  wrote:

> I mean. HDFS and HBase.
>
> On Sun, Oct 23, 2016 at 4:00 PM, Ali Akhtar  wrote:
>
>> By Hadoop do you mean HDFS?
>>
>>
>>
>> On Sun, Oct 23, 2016 at 1:56 PM, Welly Tambunan 
>> wrote:
>>
>>> Hi All,
>>>
>>> I read the following comparison between hadoop and cassandra. Seems the
>>> conclusion that we use hadoop for data lake ( cold data ) and Cassandra for
>>> hot data (real time data).
>>>
>>> http://www.datastax.com/nosql-databases/nosql-cassandra-and-hadoop
>>>
>>> My question is, can we just use cassandra to rule them all ?
>>>
>>> What we are trying to achieve is to minimize the moving part on our
>>> system.
>>>
>>> Any response would be really appreciated.
>>>
>>>
>>> Cheers
>>>
>>> --
>>> Welly Tambunan
>>> Triplelands
>>>
>>> http://weltam.wordpress.com
>>> http://www.triplelands.com 
>>>
>>
>>
>
>
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com 
>


Re: Hadoop vs Cassandra

2016-10-23 Thread Ben Slater
It’s reasonably common to use Cassandra to cover both online and analytics
requirements, particularly using it in conjunction with Spark. You can use
Cassandra’s multi-DC functionality to have online and analytics DCs for a
reasonable degree of workload separation without having to build ETL (or
some other replication) to get data between two environments.

On Sun, 23 Oct 2016 at 20:00 Ali Akhtar  wrote:

> By Hadoop do you mean HDFS?
>
>
>
> On Sun, Oct 23, 2016 at 1:56 PM, Welly Tambunan  wrote:
>
> Hi All,
>
> I read the following comparison between hadoop and cassandra. Seems the
> conclusion that we use hadoop for data lake ( cold data ) and Cassandra for
> hot data (real time data).
>
> http://www.datastax.com/nosql-databases/nosql-cassandra-and-hadoop
>
> My question is, can we just use cassandra to rule them all ?
>
> What we are trying to achieve is to minimize the moving part on our
> system.
>
> Any response would be really appreciated.
>
>
> Cheers
>
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com 
>
>
>


Re: Hadoop vs Cassandra

2016-10-23 Thread Welly Tambunan
I mean. HDFS and HBase.

On Sun, Oct 23, 2016 at 4:00 PM, Ali Akhtar  wrote:

> By Hadoop do you mean HDFS?
>
>
>
> On Sun, Oct 23, 2016 at 1:56 PM, Welly Tambunan  wrote:
>
>> Hi All,
>>
>> I read the following comparison between hadoop and cassandra. Seems the
>> conclusion that we use hadoop for data lake ( cold data ) and Cassandra for
>> hot data (real time data).
>>
>> http://www.datastax.com/nosql-databases/nosql-cassandra-and-hadoop
>>
>> My question is, can we just use cassandra to rule them all ?
>>
>> What we are trying to achieve is to minimize the moving part on our
>> system.
>>
>> Any response would be really appreciated.
>>
>>
>> Cheers
>>
>> --
>> Welly Tambunan
>> Triplelands
>>
>> http://weltam.wordpress.com
>> http://www.triplelands.com 
>>
>
>


-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com 


Re: Hadoop vs Cassandra

2016-10-23 Thread Ali Akhtar
By Hadoop do you mean HDFS?



On Sun, Oct 23, 2016 at 1:56 PM, Welly Tambunan  wrote:

> Hi All,
>
> I read the following comparison between hadoop and cassandra. Seems the
> conclusion that we use hadoop for data lake ( cold data ) and Cassandra for
> hot data (real time data).
>
> http://www.datastax.com/nosql-databases/nosql-cassandra-and-hadoop
>
> My question is, can we just use cassandra to rule them all ?
>
> What we are trying to achieve is to minimize the moving part on our
> system.
>
> Any response would be really appreciated.
>
>
> Cheers
>
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com 
>