Yicong-Huang opened a new issue, #4762:
URL: https://github.com/apache/texera/issues/4762
### What happened?
`common/workflow-core/src/main/scala/org/apache/texera/amber/util/ArrowUtils.scala::fromAttributeType`
maps `STRING`, `LARGE_BINARY`, and `ANY` all to `ArrowType.Utf8.INSTANCE`.
`LARGE_BINARY` is recovered via field metadata (`texera_type=LARGE_BINARY`) by
`toTexeraSchema`, but `ANY` carries no metadata, so a schema round-trip
(`toTexeraSchema(fromTexeraSchema(schema))`) silently turns every `ANY`
attribute into `STRING`. The cross-language schema bridge therefore loses the
`ANY` distinction entirely.
### How to reproduce?
```scala
import org.apache.texera.amber.core.tuple.{Attribute, AttributeType, Schema}
import org.apache.texera.amber.util.ArrowUtils
val original = Schema(List(new Attribute("v", AttributeType.ANY)))
val recovered =
ArrowUtils.toTexeraSchema(ArrowUtils.fromTexeraSchema(original))
// recovered.getAttributes.head.getType == AttributeType.STRING (information
lost)
```
### Version
1.1.0-incubating (Pre-release/Master)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]