timsaucer opened a new issue, #647:
URL: https://github.com/apache/datafusion-python/issues/647
**Describe the bug**
When attempting to call `show()` on a DataFrame that contains a built in
window function on a column that has struct elements, it produces the error
`Compute error: concat requires input of at least one array`. However other
functions such as `count()` do not have issues. I have less experience with
DataFusion, so I just expected `count()` to do a full evaluation like it does
in pyspark, so it's possible that my assumption is incorrect in that having any
bearing on this error.
**To Reproduce**
This minimal example can reproduce the window function working properly on a
simple element type and failing with a very simple struct.
```
import pyarrow as pa
from datafusion import SessionContext
import datafusion.functions as F
# taken from datafusion/tests/test_dataframe.py
def struct_df():
ctx = SessionContext()
# create a RecordBatch and a new DataFrame from it
batch = pa.RecordBatch.from_arrays(
[pa.array([{"c": 1}, {"c": 2}, {"c": 3}]), pa.array([4, 5, 6])],
names=["a", "b"],
)
return ctx.create_dataframe([[batch]])
df = struct_df()
df.show()
df.select(F.col("a"), F.col("b"), F.window("lag",
[F.col("b")]).alias("lag_b")).show()
print("Calling count on lag a: ", df.select(F.col("a"), F.col("b"),
F.window("lag", [F.col("a")]).alias("lag_a")).count())
df.select(F.col("a"), F.col("b"), F.window("lag",
[F.col("a")]).alias("lag_a")).show()
```
Produces the following output:
```
DataFrame()
+--------+---+
| a | b |
+--------+---+
| {c: 1} | 4 |
| {c: 2} | 5 |
| {c: 3} | 6 |
+--------+---+
DataFrame()
+--------+---+-------+
| a | b | lag_b |
+--------+---+-------+
| {c: 1} | 4 | |
| {c: 2} | 5 | 4 |
| {c: 3} | 6 | 5 |
+--------+---+-------+
Calling count on lag a: 3
Traceback (most recent call last):
File "/Users/tsaucer/src/arrow-datafusion-python/example_lag_struct.py",
line 25, in <module>
df.select(F.col("a"), F.col("b"), F.window("lag",
[F.col("a")]).alias("lag_a")).show()
Exception: Arrow error: Compute error: concat requires input of at least one
array
```
In searching the web there was a similar error thrown that this old MR
resolved in sort operations:
https://github.com/apache/arrow/pull/9275/files#diff-3ee8e6ac2472badc7bb448c360f56ed60f06a787d1f45ea589d9e213eaf2ae82
**Expected behavior**
Calling `show()` on a window function with a struct column type should
operate similar to simple column types.
**Additional context**
I'm willing to work on this myself, but I'm not familiar with the internals
of the plan execution. I've looked around myself to see if I can find anything
obvious, but nothing is jumping out at me. If you can provide any directions or
pointers, I would appreciate it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]