Hi,
I'm wondering whether there is an efficient way to continuously append
new data to a registered spark SQL table.
This is what I want: I want to make an ad-hoc query service to a
json formated system log. Certainly, the system log is continuously generated.
I will use spark streaming to connect the system log as my input, and I want to
find a way to effectively append the new data into an existed spark SQL table.
Further more, I want the whole table being cached in memory/tachyon.
It looks like spark sql supports the "INSERT" method, but only for
parquet file. In addition, it is inefficient to insert a single row every time.
I do know that somebody build a similar system that I want (ad-hoc query
service to a on growing system log). So, there must be an efficient way. Anyone
knows?