emanueledomingo opened a new issue, #568:
URL: https://github.com/apache/arrow-datafusion-python/issues/568

   Hi Everyone,
   
   I'm not sure if this is the right place for this. It's more a question than 
a real bug.
   
   **Describe the bug**
   I tried to generate the logical plan of a query, instead of passing the 
query's text in the `.sql` function, using substrait. The substrait compilation 
fails while the function executes it without any problem.
   
   What is the reason behind this behavior?
   
   **To Reproduce**
   
   ```py
   import datafusion
   from datafusion.substrait import substrait as ss
   import pyarrow as pa
   import pyarrow.dataset as pda
   from faker import Faker
   
   print(f"DF: {datafusion.__version__}\nPA: {pa.__version__}")  # DF: 32.0.0 
PA: 14.0.2
   
   fake = Faker()
   
   N_ROWS = 1_000
   
   dummy_table = pa.Table.from_pydict(
       {
           "id": range(N_ROWS),
           "name": (fake.name() for _ in range(N_ROWS)),
           "country_code": (fake.country_code() for _ in range(N_ROWS)),
       }
   )
   
   q = """
   SELECT
       "t1".*
       , "t2".*
   FROM "table" "t1"
   INNER JOIN "table" "t2"
       ON "t1"."id" = CASE WHEN "t2"."id" < 10 THEN "t2"."id" ELSE 10 END
   """
   
   ctx = datafusion.SessionContext()
   ctx.register_dataset(name="table", dataset=pda.dataset(dummy_table))
   
   df = ctx.sql(q)
   default_plan = df.logical_plan()
   
   plan = ss.serde.serialize_to_plan(q, ctx)
   logical_plan = ss.consumer.from_substrait_plan(ctx, plan)  # <- Exception 
here
   df = ctx.create_dataframe_from_logical_plan(plan=logical_plan)
   ss_plan = df.logical_plan()
   ```
   
   Exception is:
   
   ```
   ---------------------------------------------------------------------------
   Exception                                 Traceback (most recent call last)
   Cell In[6], line 2
         1 plan = ss.serde.serialize_to_plan(q2, ctx)
   ----> 2 logical_plan = ss.consumer.from_substrait_plan(ctx, plan)
         3 df = ctx.create_dataframe_from_logical_plan(plan=logical_plan)
         4 ss_plan = df.logical_plan()
   
   Exception: DataFusion error: Plan("invalid join condition expression")
   ```
   
   **Expected behavior**
   ```py
   assert ss_plan == default_plan
   # True
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to