Hi Georg, Yes, that should be possible with Alluxio. Tachyon was renamed to Alluxio.
This article on how Alluxio is used for a Spark streaming use case <https://www.alluxio.com/blog/qunar-performs-real-time-data-analytics-up-to-300x-faster-with-alluxio> may be helpful. Thanks, Gene On Fri, Apr 21, 2017 at 8:22 AM, Georg Heiler <georg.kf.hei...@gmail.com> wrote: > You could write your views to hive or maybe tachyon. > > Is the periodically updated data big? > > Hemanth Gudela <hemanth.gud...@qvantel.com> schrieb am Fr. 21. Apr. 2017 > um 16:55: > >> Being new to spark, I think I need your suggestion again. >> >> >> >> #2 you can always define a batch Dataframe and register it as view, and >> then run a background then periodically creates a new Dataframe with >> updated data and re-registers it as a view with the same name >> >> >> >> I seem to have misunderstood your statement and tried registering static >> dataframe as a temp view (“myTempView”) using createOrReplaceView in one >> spark session, and tried re-registering another refreshed dataframe as temp >> view with same name (“myTempView”) in another session. However, with this >> approach, I have failed to achieve what I’m aiming for, because views are >> local to one spark session. >> >> From spark 2.1.0 onwards, Global view is a nice feature, but still would >> not solve my problem, because global view cannot be updated. >> >> >> >> So after much thinking, I understood that you would have meant to use a >> background running process in the same spark job that would periodically >> create a new dataframe and re-register temp view with same name, within the >> same spark session. >> >> Could you please give me some pointers to documentation on how to create >> such asynchronous background process in spark streaming? Is Scala’s >> “Futures” the way to achieve this? >> >> >> >> Thanks, >> >> Hemanth >> >> >> >> >> >> *From: *Tathagata Das <tathagata.das1...@gmail.com> >> >> >> *Date: *Friday, 21 April 2017 at 0.03 >> *To: *Hemanth Gudela <hemanth.gud...@qvantel.com> >> >> *Cc: *Georg Heiler <georg.kf.hei...@gmail.com>, "user@spark.apache.org" < >> user@spark.apache.org> >> >> >> *Subject: *Re: Spark structured streaming: Is it possible to >> periodically refresh static data frame? >> >> >> >> Here are couple of ideas. >> >> 1. You can set up a Structured Streaming query to update in-memory table. >> >> Look at the memory sink in the programming guide - >> http://spark.apache.org/docs/latest/structured- >> streaming-programming-guide.html#output-sinks >> >> So you can query the latest table using a specified table name, and also >> join that table with another stream. However, note that this in-memory >> table is maintained in the driver, and so you have be careful about the >> size of the table. >> >> >> >> 2. If you cannot define a streaming query in the slow moving due to >> unavailability of connector for your streaming data source, then you can >> always define a batch Dataframe and register it as view, and then run a >> background then periodically creates a new Dataframe with updated data and >> re-registers it as a view with the same name. Any streaming query that >> joins a streaming dataframe with the view will automatically start using >> the most updated data as soon as the view is updated. >> >> >> >> Hope this helps. >> >> >> >> >> >> On Thu, Apr 20, 2017 at 1:30 PM, Hemanth Gudela < >> hemanth.gud...@qvantel.com> wrote: >> >> Thanks Georg for your reply. >> >> But I’m not sure if I fully understood your answer. >> >> >> >> If you meant to join two streams (one reading Kafka, and another reading >> database table), then I think it’s not possible, because >> >> 1. According to documentation >> <http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#data-sources>, >> Structured streaming does not support database as a streaming source >> >> 2. Joining between two streams is not possible yet. >> >> >> >> Regards, >> >> Hemanth >> >> >> >> *From: *Georg Heiler <georg.kf.hei...@gmail.com> >> *Date: *Thursday, 20 April 2017 at 23.11 >> *To: *Hemanth Gudela <hemanth.gud...@qvantel.com>, "user@spark.apache.org" >> <user@spark.apache.org> >> *Subject: *Re: Spark structured streaming: Is it possible to >> periodically refresh static data frame? >> >> >> >> What about treating the static data as a (slow) stream as well? >> >> >> >> Hemanth Gudela <hemanth.gud...@qvantel.com> schrieb am Do., 20. Apr. >> 2017 um 22:09 Uhr: >> >> Hello, >> >> >> >> I am working on a use case where there is a need to join streaming data >> frame with a static data frame. >> >> The streaming data frame continuously gets data from Kafka topics, >> whereas static data frame fetches data from a database table. >> >> >> >> However, as the underlying database table is getting updated often, I >> must somehow manage to refresh my static data frame periodically to get the >> latest information from underlying database table. >> >> >> >> My questions: >> >> 1. Is it possible to periodically refresh static data frame? >> >> 2. If refreshing static data frame is not possible, is there a >> mechanism to automatically stop & restarting spark structured streaming >> job, so that every time the job restarts, the static data frame gets >> updated with latest information from underlying database table. >> >> 3. If 1) and 2) are not possible, please suggest alternatives to >> achieve my requirement described above. >> >> >> >> Thanks, >> >> Hemanth >> >> >> >