[ https://issues.apache.org/jira/browse/ARROW-10585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andy Grove updated ARROW-10585: ------------------------------- Description: Add join support to DataFrame and LogicalPlan. h2. Logical Plan My initial thoughts on the design of the LogicalPlan struct would be: {code:java} struct InnerJoin { left: Box<LogicalPlan>, right: Box<LogicalPlan>, left_keys: Vec<Expr>, right_keys: Vec<Expr> } {code} The left_keys and right_keys vectors must have the same length. Example pseudo-code: {code:java} let join = InnerJoin { left: read_parquet("customers"), right: read_parquer("orders"), left_keys: vec![col("id")], right_keys: vec![col("customer_id")] }; {code} h2. DataFrame {code:java} let customer = ctx.read_parquet("customers").alias("c"); let orders = ctx.read_parquet("orders").alias("o"); // generic join method that can support all types of join let join = customer.join(orders, col("c.id").eq("o.customer_id")) // or we could start with a more specific equijoin method let join = customer.inner_join(orders, vec![col("id")], vec![col("customer_id")]);{code} was:Add join support to DataFrame and LogicalPlan > [Rust] [DataFusion] Add join support to DataFrame and LogicalPlan > ----------------------------------------------------------------- > > Key: ARROW-10585 > URL: https://issues.apache.org/jira/browse/ARROW-10585 > Project: Apache Arrow > Issue Type: Sub-task > Components: Rust - DataFusion > Reporter: Andy Grove > Priority: Major > > Add join support to DataFrame and LogicalPlan. > h2. Logical Plan > My initial thoughts on the design of the LogicalPlan struct would be: > {code:java} > struct InnerJoin { > left: Box<LogicalPlan>, > right: Box<LogicalPlan>, > left_keys: Vec<Expr>, > right_keys: Vec<Expr> > } {code} > The left_keys and right_keys vectors must have the same length. Example > pseudo-code: > {code:java} > let join = InnerJoin { > left: read_parquet("customers"), > right: read_parquer("orders"), > left_keys: vec![col("id")], > right_keys: vec![col("customer_id")] > }; {code} > h2. DataFrame > {code:java} > let customer = ctx.read_parquet("customers").alias("c"); > let orders = ctx.read_parquet("orders").alias("o"); > // generic join method that can support all types of join > let join = customer.join(orders, col("c.id").eq("o.customer_id")) > // or we could start with a more specific equijoin method > let join = customer.inner_join(orders, vec![col("id")], > vec![col("customer_id")]);{code} > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)