HI all,
I am in the process of migrating a relational table to Hbase.
Current table: records user access logs
id : PK
userId
url
timestamp
refer_url
ip_address
cc : country code of ip address
my potential queries would be
- grab all pages visited by a user
- generate a report of country : number of page views
I want to understand the implications of different Hbase
implementations and how they might affect queries.
A) similar case study
(http://www.slideshare.net/hmisty/20090713-hbase-schema-design-case-studies)
recommends using timestamp as key (actually timestamp + counter).
timestamp + counter => {
user => {
userId
}
url => {
url:
}
from => {
ip_address
referer_url
cc
}
}
B) I am wondering if I can use 'user' as key, but since there are
going to be multiple logs per user, one possiblity might be
'userid + timestemp' as key
I have seen this :
http://www.nabble.com/Advice-on-table-design-td21110283.html#a21110283
userid + timestamp => {
url => {
url:
}
from => {
ip_address
referer_url
cc
}
}
C) I am also wondering since cells are versioned with timestamps, I
can use it to represent multiple requests from the same user to same
url
userid => {
url => {
}
from => {
ip_address
...
}
}
Any suggestions are most appreciated.
thanks
SM