df.union(df2) should be supported when both DataFrames are created from a streaming source. What error are you seeing?
On Fri, Jul 7, 2017 at 11:27 AM, Lalwani, Jayesh < jayesh.lalw...@capitalone.com> wrote: > In structured streaming, Is there a way to Union 2 streaming data frames? > Are there any plans to support Union of 2 streaming dataframes soon? I can > understand the inherent complexity in joining 2 streaming data frames. But, > Union is just concatenating 2 microbatches, innit? > > > > The problem that we are trying to solve is that we have a Kafka stream > that is receiving events. Each event is assosciated with an account ID. We > have a data store that stores historical events for hundreds of millions > of accounts. What we want to do is for the events coming in the input > stream, we want to add in all the historical events from the data store and > give it to a model. > > > > Initially, the way we were planning to do this is > a) read from Kafka into a streaming dataframe. Call this inputDF. > b) In a mapWithPartition method, get all the unique accounts in the > partition. Look up all the historical events for those unique accounts and > return them. Let’s call this historicalDF > > c) Union inputDF with historicalDF. Call this allDF > > d) Call mapWithPartition on allDF and give the records to the model > > > > Of course, this doesn’t work because both inputDF and historicalDF are > streaming data frames. > > > > What we ended up doing is in step b) we output the input records with the > historical records, which works but seems like a hacky way of doing things. > The operation that does lookup does union too. This works for now because > the data from the data store doesn’t require any transformation or > aggregation. But, if it did, we would like to do that using Spark SQL, > whereas this solution forces us to doing any transformation of historical > data in Scala > > > > Is there a Sparky way of doing this? > > > ------------------------------ > > The information contained in this e-mail is confidential and/or > proprietary to Capital One and/or its affiliates and may only be used > solely in performance of work or services for Capital One. The information > transmitted herewith is intended only for use by the individual or entity > to which it is addressed. If the reader of this message is not the intended > recipient, you are hereby notified that any review, retransmission, > dissemination, distribution, copying or other use of, or taking of any > action in reliance upon this information is strictly prohibited. If you > have received this communication in error, please contact the sender and > delete the material from your computer. >