hntd187 commented on issue #1544: URL: https://github.com/apache/arrow-datafusion/issues/1544#issuecomment-1013426317
> I've used kafka-streams, flink and beam professionally, the point of streaming is to execute windowed functions, join aggregated data with in-memory tables and distribute these computations (DAGs) according to partition keys, so that the right value gets sent to the right thread. > > In terms of priority, it looks to me like this would not be a reasonable thing to do before having methods like `DataFrame::map_partition` and traits like `SortMergeJoin` We can add that to the design if we have to implement it. We maybe able to be dumb about it in the meantime. ```rust let ec = ExecutionContext::new(); let dfs = DFSchema::new( vec![ DFField::new(Some("topic1"), "key", DataType::Binary, false), DFField::new(Some("topic1"), "value", DataType::Binary, false), ] ).unwrap(); let lp = LogicalPlan::StreamingScan(StreamScan { topic_name: "topic1".to_string(), source: Arc::new(KafkaExecutionPlan { time_window: Duration::from_millis(200), topic: "topic1".to_string(), batch_size: 5, conf: consumer_config("0", None), }), schema: Arc::new(dfs), batch_size: Some(5), }); let df = DataFrameImpl::new(ec.state, &lp); timeout(Duration::from_secs(10), async move { let mut b = df.execute_stream().await.unwrap(); let batch = b.next().await; dbg!(batch); }).await.unwrap(); ``` Based on my awful awful base implementation this gets data back. Hooking this up has taken me into the depths of DF more than I hoped, I don't know if outside of a custom DataFrame impl this would be possible in a contrib module. I guess the struggle I am facing here is it seems like the dataframe right now has to get a "wait or end" of data reading notification, where obviously for streaming this really never comes. Is anyone more readily aware of where this occurs or if I'm on the right track? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org