Thanks all for actively sharing your experience. @Chris: using something like Redis is something I am trying to figure out. I have a lots of transactions, so I couldn't trigger update event for every single transaction. I'm looking at Spark Streaming because it provide batch processing (e.g I can update the cache every 5 seconds). In addition Spark can scale pretty well and I don't have to worry about losing data.
Now having the cache with following information: * Date * BranchID * ProductID TotalQty TotalDollar * is key, note that I have history data as well (byday). Now I want to use zeppelin for querying again the cache (while the cache is updating). I don't need the Zeppelin update automatically (I can hit the run button myself :) ) Just curious if parquet is the right solution for us? On Sun, Mar 13, 2016 at 3:25 PM, Chris Miller <cmiller11...@gmail.com> wrote: > Cool! Thanks for sharing. > > > -- > Chris Miller > > On Sun, Mar 13, 2016 at 12:53 AM, Todd Nist <tsind...@gmail.com> wrote: > >> Below is a link to an example which Silvio Fiorito put together >> demonstrating how to link Zeppelin with Spark Stream for real-time charts. >> I think the original thread was pack in early November 2015, subject: Real >> time chart in Zeppelin, if you care to try to find it. >> >> https://gist.github.com/granturing/a09aed4a302a7367be92 >> >> HTH. >> >> -Todd >> >> On Sat, Mar 12, 2016 at 6:21 AM, Chris Miller <cmiller11...@gmail.com> >> wrote: >> >>> I'm pretty new to all of this stuff, so bare with me. >>> >>> Zeppelin isn't really intended for realtime dashboards as far as I know. >>> Its reporting features (tables, graphs, etc.) are more for displaying the >>> results from the output of something. As far as I know, there isn't really >>> anything to "watch" a dataset and have updates pushed to the Zeppelin UI. >>> >>> As for Spark, unless you're doing a lot of processing that you didn't >>> mention here, I don't think it's a good fit just for this. >>> >>> If it were me (just off the top of my head), I'd just build a simple web >>> service that uses websockets to push updates to the client which could then >>> be used to update graphs, tables, etc. The data itself -- that is, the >>> accumulated totals -- you could store in something like Redis. When an >>> order comes in, just add that quantity and price to the existing value and >>> trigger your code to push out an updated value to any clients via the >>> websocket. You could use something like a Redis pub/sub channel to trigger >>> the web app to notify clients of an update. >>> >>> There are about 5 million other ways you could design this, but I would >>> just keep it as simple as possible. I just threw one idea out... >>> >>> Good luck. >>> >>> >>> -- >>> Chris Miller >>> >>> On Sat, Mar 12, 2016 at 6:58 PM, trung kien <kient...@gmail.com> wrote: >>> >>>> Thanks Chris and Mich for replying. >>>> >>>> Sorry for not explaining my problem clearly. Yes i am talking about a >>>> flexibke dashboard when mention Zeppelin. >>>> >>>> Here is the problem i am having: >>>> >>>> I am running a comercial website where we selle many products and we >>>> have many branchs in many place. We have a lots of realtime transactions >>>> and want to anaylyze it in realtime. >>>> >>>> We dont want every time doing analytics we have to aggregate every >>>> single transactions ( each transaction have BranchID, ProductID, Qty, >>>> Price). So, we maintain intermediate data which contains : BranchID, >>>> ProducrID, totalQty, totalDollar >>>> >>>> Ideally, we have 2 tables: >>>> Transaction ( BranchID, ProducrID, Qty, Price, Timestamp) >>>> >>>> And intermediate table Stats is just sum of every transaction group by >>>> BranchID and ProductID( i am using Sparkstreaming to calculate this table >>>> realtime) >>>> >>>> My thinking is that doing statistics ( realtime dashboard) on Stats >>>> table is much easier, this table is also not enough for maintain. >>>> >>>> I'm just wondering, whats the best way to store Stats table( a database >>>> or parquet file?) >>>> What exactly are you trying to do? Zeppelin is for interactive analysis >>>> of a dataset. What do you mean "realtime analytics" -- do you mean build a >>>> report or dashboard that automatically updates as new data comes in? >>>> >>>> >>>> -- >>>> Chris Miller >>>> >>>> On Sat, Mar 12, 2016 at 3:13 PM, trung kien <kient...@gmail.com> wrote: >>>> >>>>> Hi all, >>>>> >>>>> I've just viewed some Zeppenlin's videos. The intergration between >>>>> Zeppenlin and Spark is really amazing and i want to use it for my >>>>> application. >>>>> >>>>> In my app, i will have a Spark streaming app to do some basic realtime >>>>> aggregation ( intermediate data). Then i want to use Zeppenlin to do some >>>>> realtime analytics on the intermediate data. >>>>> >>>>> My question is what's the most efficient storage engine to store >>>>> realtime intermediate data? Is parquet file somewhere is suitable? >>>>> >>>> >>>> >>> >> > -- Thanks Kien