Re: [Architecture] RFC: Doing Bulk Events Updates to HDFS instead of Cassandra

Anjana Fernando Thu, 06 Nov 2014 18:55:50 -0800

Hi Srinath,

I think that example is a bit flawed :) .. I didn't mean to compare
Cassandra with the HDFS case here, I know Cassandra is far more complicated
than the HDFS operations, where the data operations in HDFS is very simple,
and I've a feeling, that with that much small events, it may have turned
into an CPU bound operation rather than I/O bound, because of the
processing required for each event (maybe their batch impl. is crappy),
that maybe why even the bigger batch is also slow. OS level buffers you
said, yeah, so they efficiently batch the physical disk writes, in the
memory, and flush it out later. But that's a different thing, here, we are
just writing to the disk and reading it back again, so as I see, we are
just using the local disk as a buffer, where we could just do this in the
RAM. Basically, build up sizable chunks in memory, and write to HDFS. So we
lose the, even though comparably little, overhead of writing and reading to
the local disk, where still, the bottleneck would be writing the data out
of the network, to a remote server's disk somewhere. Simply put, this
direct HDFS operation should be able to saturate the network link we have,
even if we can't, we can ask ourself, how can writing it to the local disk
and reading it again, optimize it more.


Cheers,
Anjana.

On Thu, Nov 6, 2014 at 6:15 PM, Srinath Perera <srin...@wso2.com> wrote:

> Of course we need to try it out and verify, I am just making a case that
> we should try it out :)
>
> Also, RDBMS should be default as most scenarios can be handled with DBs
> and those is no reason to make everyone's life complicated.
>
> --Srinath
>
> On Fri, Nov 7, 2014 at 7:44 AM, Srinath Perera <srin...@wso2.com> wrote:
>
>> 1) Anjana you assuming the bandwidth is the bottleneck. Let me give an
>> example.
>>
>> With sequential reads and writes, a HDD can do > 100MB/sec and 1G network
>> can do > 50 MB/sec
>> But BAM best number we have seen is about 40k event/sec (that with 4
>> machines or so, lets assume one machine). Lets assume 20 bytes events. Then
>> it will be doing <1MB/sec.
>>
>> Problem is Cassandra break data to lot of small operations losing OS
>> level buffer to buffer transfers files transfers can do. I have tried
>> increasing batch size for cassandra, which help with smaller batches. But
>> after about few thousand operations in the same batch, things start get
>> much slower.
>>
>> Best numbers will come when we run two receivers instead of NFS.
>>
>> 2) Frank, this is analytics data. So it is read only and most cases we
>> need only time based queries with less resolution (15min smallest
>> resolution is fine for most case). This to say run this batch query on last
>> hour of data so on.
>>
>> However, we have some scenarios where we do Adhoc queries for things like
>> activity monitoring. Those would not work for those and we will have to run
>> a batch job to push that data to RDBMS or Solar etc. Anjana, we need to
>> discuss this.
>>
>> But also there are lot of usecases to receive and write the event to disk
>> as soon as possible and later run MapReduce on top them. For those above
>> will work.
>>
>> --Srinath
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Fri, Nov 7, 2014 at 7:23 AM, Anjana Fernando <anj...@wso2.com> wrote:
>>
>>> Hi Sanjiva,
>>>
>>> On Thu, Nov 6, 2014 at 4:01 PM, Sanjiva Weerawarana <sanj...@wso2.com>
>>> wrote:
>>>
>>>> Anjana I think the idea was for the file system -> HDFS upload to
>>>> happen via a simple cron job type thing.
>>>>
>>>
>>> Even so, we will be just moving the problem to another area, the overall
>>> effort done by that hardware is still the same (writing to disk, reading it
>>> back, write it to network). That is, even though we can goto very a high
>>> throughput initially by writing it to the local disk at first, later on we
>>> have to read it back and write it to HDFS via the network, which is the
>>> slower part of our operation. So if we continue to load the machine with an
>>> extreme throughput, you will eventually lose space in that disk.
>>>
>>> Cheers,
>>> Anjana.
>>>
>>>
>>>>
>>>> On Wed, Nov 5, 2014 at 9:19 AM, Anjana Fernando <anj...@wso2.com>
>>>> wrote:
>>>>
>>>>> Hi Srinath,
>>>>>
>>>>> Wouldn't it better, if we just make the batch size bigger, that is,
>>>>> lets just have a sizable local in-memory store, something probably close 
>>>>> to
>>>>> 64MB, which is the default HDFS block size, and only after this is filled,
>>>>> or if the receiver is idle maybe, we can flush the buffer. I was just
>>>>> thinking, writing to the file system first itself will be expensive, where
>>>>> there are additional steps of writing all the records to the local file
>>>>> system and again reading it back, and then finally writing it to HDFS, and
>>>>> of course, again having a network file system would be an overhead, and 
>>>>> not
>>>>> to mention the implementation/configuration complications that will come
>>>>> with this. IMHO, we should try to make these scenarios as simple as
>>>>> possible.
>>>>>
>>>>> I'm doing our new BAM data layer implementations here [1], where I'm
>>>>> almost done with an RDBMS implementation, doing some refactoring now (mail
>>>>> on this yet to come :)), I can also do an HDFS one after that and check 
>>>>> it.
>>>>>
>>>>> [1]
>>>>> https://github.com/wso2/carbon-analytics/tree/master/components/xanalytics
>>>>>
>>>>> Cheers,
>>>>> Anjana.
>>>>>
>>>>> On Tue, Nov 4, 2014 at 6:56 PM, Srinath Perera <srin...@wso2.com>
>>>>> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> Following came out of chat with Sanjiva on a scenario involve very
>>>>>> large number of events coming into BAM.
>>>>>>
>>>>>> Currently we use Cassandra to store the events and number we got out
>>>>>> of it has not been great and Cassandra need too much attention to get to
>>>>>> those number.
>>>>>>
>>>>>> With Cassandra (or any DB) we write data as records. We can batch it,
>>>>>> but still amount of data in one IO operation is small. In comparison,  
>>>>>> file
>>>>>> transfers are much much faster and that is fastest way to get some data
>>>>>> from A to B.
>>>>>>
>>>>>> So I am proposing to write the events that comes into a local file in
>>>>>> the Data Receiver, and periodically append them to a HDFS file. We can
>>>>>> arrange data in a folder by stream and files by timestamp (e.g. 1h data 
>>>>>> go
>>>>>> to a new file), so we can selectively pull and process data using Hive. 
>>>>>> (We
>>>>>> can use something like https://github.com/OpenHFT/Chronicle-Queue to
>>>>>> write data to disk).
>>>>>>
>>>>>> If user needs avoid losing any messages at all in case of a disk
>>>>>> failure, either he can have a SAN or NTFS or can run two replicas of
>>>>>> receivers  (we should write some code so only one of the receivers will
>>>>>> actually put data to HDFS).
>>>>>>
>>>>>> Coding wise, this should not be too hard. I am sure this will be
>>>>>> factor of time faster than Cassandra (of course we need to do a PoC and
>>>>>> verify).
>>>>>>
>>>>>> WDYT?
>>>>>>
>>>>>> --Srinath
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> ============================
>>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>>>>>> Site: http://people.apache.org/~hemapani/
>>>>>> Photos: http://www.flickr.com/photos/hemapani/
>>>>>> Phone: 0772360902
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Anjana Fernando*
>>>>> Senior Technical Lead
>>>>> WSO2 Inc. | http://wso2.com
>>>>> lean . enterprise . middleware
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Sanjiva Weerawarana, Ph.D.
>>>> Founder, Chairman & CEO; WSO2, Inc.;  http://wso2.com/
>>>> email: sanj...@wso2.com; office: (+1 650 745 4499 | +94  11 214 5345)
>>>> x5700; cell: +94 77 787 6880 | +1 408 466 5099; voip: +1 650 265 8311
>>>> blog: http://sanjiva.weerawarana.org/; twitter: @sanjiva
>>>> Lean . Enterprise . Middleware
>>>>
>>>
>>>
>>>
>>> --
>>> *Anjana Fernando*
>>> Senior Technical Lead
>>> WSO2 Inc. | http://wso2.com
>>> lean . enterprise . middleware
>>>
>>
>>
>>
>> --
>> ============================
>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>> Site: http://people.apache.org/~hemapani/
>> Photos: http://www.flickr.com/photos/hemapani/
>> Phone: 0772360902
>>
>
>
>
> --
> ============================
> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
> Site: http://people.apache.org/~hemapani/
> Photos: http://www.flickr.com/photos/hemapani/
> Phone: 0772360902
>



-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware

_______________________________________________
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] RFC: Doing Bulk Events Updates to HDFS instead of Cassandra

Reply via email to