Hi Mich, While Alluxio is not a database (it exposes a file system interface), you can use Alluxio to keep certain data in memory. With Alluxio, you can selectively pin data in memory (http://www.alluxio.org/docs/ master/en/Command-Line-Interface.html#pin). There are also ways to control how to read and write the data in Alluxio memory ( http://www.alluxio.org/docs/master/en/File-System-API.html). These options and features can help you control how you access your data.
Thanks, Gene On Sat, Sep 17, 2016 at 9:53 AM, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Hi, > > I am seeing similar issues when I was working on Oracle with Tableau as > the dashboard. > > Currently I have a batch layer that gets streaming data from > > source -> Kafka -> Flume -> HDFS > > It stored on HDFS as text files and a cron process sinks Hive table with > the the external table build on the directory. I tried both ORC and Parquet > but I don't think the query itself is the issue. > > Meaning it does not matter how clever your execution engine is, the fact > you still have to do considerable amount of Physical IO (PIO) as opposed > to Logical IO (LIO) to get the data to Zeppelin is on the critical path. > > One option is to limit the amount of data in Zeppelin to certain number of > rows or something similar. However, you cannot tell a user he/she cannot > see the full data. > > We resolved this with Oracle by using Oracle TimesTen > <http://www.oracle.com/technetwork/database/database-technologies/timesten/overview/index.html>IMDB > to cache certain tables in memory and get them refreshed (depending on > refresh frequency) from the underlying table in Oracle when data is > updated). That is done through cache fusion. > > I was looking around and came across Alluxio <http://www.alluxio.org/>. > Ideally I like to utilise such concept like TimesTen. Can one distribute > Hive table data (or any table data) across the nodes cached. In that case > we will be doing Logical IO which is about 20 times or more lightweight > compared to Physical IO. > > Anyway this is the concept. > > Thanks > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > >