You could write your views to hive or maybe tachyon. Is the periodically updated data big? Hemanth Gudela <hemanth.gud...@qvantel.com> schrieb am Fr. 21. Apr. 2017 um 16:55:
> Being new to spark, I think I need your suggestion again. > > > > #2 you can always define a batch Dataframe and register it as view, and > then run a background then periodically creates a new Dataframe with > updated data and re-registers it as a view with the same name > > > > I seem to have misunderstood your statement and tried registering static > dataframe as a temp view (“myTempView”) using createOrReplaceView in one > spark session, and tried re-registering another refreshed dataframe as temp > view with same name (“myTempView”) in another session. However, with this > approach, I have failed to achieve what I’m aiming for, because views are > local to one spark session. > > From spark 2.1.0 onwards, Global view is a nice feature, but still would > not solve my problem, because global view cannot be updated. > > > > So after much thinking, I understood that you would have meant to use a > background running process in the same spark job that would periodically > create a new dataframe and re-register temp view with same name, within the > same spark session. > > Could you please give me some pointers to documentation on how to create > such asynchronous background process in spark streaming? Is Scala’s > “Futures” the way to achieve this? > > > > Thanks, > > Hemanth > > > > > > *From: *Tathagata Das <tathagata.das1...@gmail.com> > > > *Date: *Friday, 21 April 2017 at 0.03 > *To: *Hemanth Gudela <hemanth.gud...@qvantel.com> > > *Cc: *Georg Heiler <georg.kf.hei...@gmail.com>, "user@spark.apache.org" < > user@spark.apache.org> > > > *Subject: *Re: Spark structured streaming: Is it possible to periodically > refresh static data frame? > > > > Here are couple of ideas. > > 1. You can set up a Structured Streaming query to update in-memory table. > > Look at the memory sink in the programming guide - > http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#output-sinks > > So you can query the latest table using a specified table name, and also > join that table with another stream. However, note that this in-memory > table is maintained in the driver, and so you have be careful about the > size of the table. > > > > 2. If you cannot define a streaming query in the slow moving due to > unavailability of connector for your streaming data source, then you can > always define a batch Dataframe and register it as view, and then run a > background then periodically creates a new Dataframe with updated data and > re-registers it as a view with the same name. Any streaming query that > joins a streaming dataframe with the view will automatically start using > the most updated data as soon as the view is updated. > > > > Hope this helps. > > > > > > On Thu, Apr 20, 2017 at 1:30 PM, Hemanth Gudela < > hemanth.gud...@qvantel.com> wrote: > > Thanks Georg for your reply. > > But I’m not sure if I fully understood your answer. > > > > If you meant to join two streams (one reading Kafka, and another reading > database table), then I think it’s not possible, because > > 1. According to documentation > <http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#data-sources>, > Structured streaming does not support database as a streaming source > > 2. Joining between two streams is not possible yet. > > > > Regards, > > Hemanth > > > > *From: *Georg Heiler <georg.kf.hei...@gmail.com> > *Date: *Thursday, 20 April 2017 at 23.11 > *To: *Hemanth Gudela <hemanth.gud...@qvantel.com>, "user@spark.apache.org" > <user@spark.apache.org> > *Subject: *Re: Spark structured streaming: Is it possible to periodically > refresh static data frame? > > > > What about treating the static data as a (slow) stream as well? > > > > Hemanth Gudela <hemanth.gud...@qvantel.com> schrieb am Do., 20. Apr. 2017 > um 22:09 Uhr: > > Hello, > > > > I am working on a use case where there is a need to join streaming data > frame with a static data frame. > > The streaming data frame continuously gets data from Kafka topics, whereas > static data frame fetches data from a database table. > > > > However, as the underlying database table is getting updated often, I must > somehow manage to refresh my static data frame periodically to get the > latest information from underlying database table. > > > > My questions: > > 1. Is it possible to periodically refresh static data frame? > > 2. If refreshing static data frame is not possible, is there a > mechanism to automatically stop & restarting spark structured streaming > job, so that every time the job restarts, the static data frame gets > updated with latest information from underlying database table. > > 3. If 1) and 2) are not possible, please suggest alternatives to > achieve my requirement described above. > > > > Thanks, > > Hemanth > > >