Re: Is there such thing as cache fusion with the underlying tables/files on HDFS

2016-09-19 Thread Gene Pang
Hi Mich, While Alluxio is not a database (it exposes a file system interface), you can use Alluxio to keep certain data in memory. With Alluxio, you can selectively pin data in memory (http://www.alluxio.org/docs/ master/en/Command-Line-Interface.html#pin). There are also ways to control how to

Re: Is there such thing as cache fusion with the underlying tables/files on HDFS

2016-09-18 Thread Mich Talebzadeh
Good points Well the batch layer will be able to read streaming data from flume files if needed using Spark csv. It may take a bit longer but that is not the focus of batch layer. All real time data will be through the speed layer using Spark streaming where the real time alerts/notification

Re: Is there such thing as cache fusion with the underlying tables/files on HDFS

2016-09-18 Thread Jörn Franke
Ignite has a special cache for HDFS data (which is not a Java cache), for rdds etc. So you are right it is in this sense very different. Besides caching, from what I see from data scientists is that for interactive queries and models evaluation they anyway do not browse the complete data. Even

Re: Is there such thing as cache fusion with the underlying tables/files on HDFS

2016-09-18 Thread Mich Talebzadeh
Thanks everyone for ideas. Sounds like Ignite has been taken by GridGain so becomes similar to HazelCast open source by name only. However, an in-memory Java Cache may or may not help. The other options like faster databases are on the table depending who wants what (that are normally decisions

Re: Is there such thing as cache fusion with the underlying tables/files on HDFS

2016-09-18 Thread Sean Owen
Alluxio isn't a database though; it's storage. I may be still harping on the wrong solution for you, but as we discussed offline, that's also what Impala, Drill et al are for. Sorry if this was mentioned before but Ignite is what GridGain became, if that helps. On Sat, Sep 17, 2016 at 11:00 PM,

Re: Is there such thing as cache fusion with the underlying tables/files on HDFS

2016-09-18 Thread Jörn Franke
In Tableau you can use the in-memory facilities of the Tableau server. As said, Apache Ignite could be one way. You can also use it to make Hive tables in-memory. While reducing IO can make sense, I do not think you will receive in production systems so much difference (at least not 20x). If

Re: Is there such thing as cache fusion with the underlying tables/files on HDFS

2016-09-17 Thread Mich Talebzadeh
Thanks Todd As I thought Apache Ignite is a data fabric much like Oracle Coherence cache or HazelCast. The use case is different between an in-memory-database (IMDB) and Data Fabric. The build that I am dealing with has a 'database centric' view of its data (i.e. it accesses its data using Spark

Re: Is there such thing as cache fusion with the underlying tables/files on HDFS

2016-09-17 Thread Mich Talebzadeh
Thanks Todd. I will have a look. Regards Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpress.com *Disclaimer:*

Re: Is there such thing as cache fusion with the underlying tables/files on HDFS

2016-09-17 Thread Todd Nist
Hi Mich, Have you looked at Apache Ignite? https://apacheignite-fs.readme.io/docs. This looks like something that may be what your looking for: http://apacheignite.gridgain.org/docs/data-analysis-with-apache-zeppelin HTH. -Todd On Sat, Sep 17, 2016 at 12:53 PM, Mich Talebzadeh

Is there such thing as cache fusion with the underlying tables/files on HDFS

2016-09-17 Thread Mich Talebzadeh
Hi, I am seeing similar issues when I was working on Oracle with Tableau as the dashboard. Currently I have a batch layer that gets streaming data from source -> Kafka -> Flume -> HDFS It stored on HDFS as text files and a cron process sinks Hive table with the the external table build on the