mattmartin14 commented on PR #1534:
URL: https://github.com/apache/iceberg-python/pull/1534#issuecomment-2635043790

   > @mattmartin14 more useful:
   > 
   > ```
   > df_1 = pd.DataFrame({'name': ["tom", "matt"]})
   > tbl_1 = pa.Table.from_pandas(df_1)
   > 
   > df_2 = pd.DataFrame({'name': ["tom", "harry"]})
   > tbl_2 = pa.Table.from_pandas(df_2)
   > 
   > mask = pa.compute.is_in(tbl_1["name"], value_set=tbl_2["name"])
   > tbl_1.filter(mask)
   > ```
   
   Thanks @tscottcoombes1 and @Fokko - got the pyarrow filter to work for 
identifying rows to insert (no datafusion needed). Code for that is below. Now 
i'm moving onto figuring out the filter for rows to update, which will be a 
little more tricky:
   
   ```python
   def get_rows_to_insert(source_table: pa.Table, target_table: pa.Table, 
join_cols: list) -> pa.Table:
     
       source_filter_expr = None
   
       for col in join_cols:
   
           target_values = target_table.column(col).to_pylist()
           expr = pc.field(col).isin(target_values)
   
           if source_filter_expr is None:
               source_filter_expr = expr
           else:
               source_filter_expr = source_filter_expr & expr
   
       non_matching_expr = ~source_filter_expr
   
       source_columns = set(source_table.column_names)
       target_columns = set(target_table.column_names)
     
       common_columns = source_columns.intersection(target_columns)
   
       non_matching_rows = 
source_table.filter(non_matching_expr).select(common_columns)
   
       return non_matching_rows
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to