hntd187 commented on issue #1544:
URL: 
https://github.com/apache/arrow-datafusion/issues/1544#issuecomment-1013426317


   > I've used kafka-streams, flink and beam professionally, the point of 
streaming is to execute windowed functions, join aggregated data with in-memory 
tables and distribute these computations (DAGs) according to partition keys, so 
that the right value gets sent to the right thread.
   > 
   > In terms of priority, it looks to me like this would not be a reasonable 
thing to do before having methods like `DataFrame::map_partition` and traits 
like `SortMergeJoin`
   
   We can add that to the design if we have to implement it. We maybe able to 
be dumb about it in the meantime.
   
   ```rust
       let ec = ExecutionContext::new();
       let dfs = DFSchema::new(
           vec![
               DFField::new(Some("topic1"), "key", DataType::Binary, false),
               DFField::new(Some("topic1"), "value", DataType::Binary, false),
           ]
       ).unwrap();
       let lp = LogicalPlan::StreamingScan(StreamScan {
           topic_name: "topic1".to_string(),
           source: Arc::new(KafkaExecutionPlan {
               time_window: Duration::from_millis(200),
               topic: "topic1".to_string(),
               batch_size: 5,
               conf: consumer_config("0", None),
           }),
           schema: Arc::new(dfs),
           batch_size: Some(5),
       });
   
   
       let df = DataFrameImpl::new(ec.state, &lp);
   
   
       timeout(Duration::from_secs(10), async move {
           let mut b = df.execute_stream().await.unwrap();
           let batch = b.next().await;
           dbg!(batch);
       }).await.unwrap();
   ```
   
   Based on my awful awful base implementation this gets data back. Hooking 
this up has taken me into the depths of DF more than I hoped, I don't know if 
outside of a custom DataFrame impl this would be possible in a contrib module. 
I guess the struggle I am facing here is it seems like the dataframe right now 
has to get a "wait or end" of data reading notification, where obviously for 
streaming this really never comes. Is anyone more readily aware of where this 
occurs or if I'm on the right track?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to