Data Model Suggestion

Pariksheet Barapatre Tue, 23 Jun 2015 06:19:56 -0700

Hello All,

This is more like a HBase question but as I am planning to use Phoenix as a
access layer, I hope phoenix user will help me.


I would like to create time series data to get on-the-fly analytics.

This use case is for adTech.

Report - what is houly,daily,weekly impression counts at country level for
a given advertisement ID (ADID).

I am doing hourly aggregation and loading into a Phoenix table.

Primary Key - *ADID          | COUNTRY       | HOUR ID*

---------------------------------------------------------------------------------
*ADID          | COUNTRY       | HOUR ID*      |  CF.IMP  |
---------------------------------------------------------------------------------
1                | US                  | 2015062301  | 3000        |
1                | US                  | 2015062302  | 3421        |
1                | UK                  | 2015062302  | 1212        |
---------------------------------------------------------------------------------

Is it a good schema design or shall I create alternate schema as below
Primary Key - *ADID          | COUNTRY       | DAY ID*
----------------------------------------------------------------------------------------------------
*ADID          | COUNTRY       | DAY ID*      |  CF.IMP*01*  | CF.IMP*02*  |
----------------------------------------------------------------------------------------------------
1                | US                  | 20150623  | 3000        |
3421         |
1                | UK                  | 20150623  | NULL        |
1212          |
----------------------------------------------------------------------------------------------------
Here, I have taken hour part from hour ID and created 24 columns.

I gone through many time-series NoSQL blog posts, most the author suggest
to go with wider rows as above. This will reduce the scan, but I don't see
much difference in both Data Models in term of latency for scanning.

Can anybody please suggest good approach for my use case?


Cheers,
Pari

Data Model Suggestion

Reply via email to