Boris V.Kuznetsov created ARROW-6133: ----------------------------------------
Summary: Schema Missing Exception in ArrowStreamReader Key: ARROW-6133 URL: https://issues.apache.org/jira/browse/ARROW-6133 Project: Apache Arrow Issue Type: Bug Components: Java Affects Versions: 0.14.1 Reporter: Boris V.Kuznetsov Hello My colleague and I are trying to pass Arrow thru Kafka. He uses a PyArrow, I'm using Scala Java API. Here's the Transmitter code: import pyarrow as pa def record_batch_to_bytes(df): batch = pa.RecordBatch.from_pandas(df) ser_ = pa.serialize(batch) return bytes(ser_.to_buffer()) My colleague is able to read this stream with the Python API: def bytes_to_batch_record(bytes_): batch = pa.deserialize(bytes_) print(batch.schema) On the Receiver side, I use the following from Java API: {color:#569cd6}def{color} {color:#dcdcaa}deserialize{color}{color:#d4d4d4}({color}{color:#9cdcfe}din{color}{color:#d4d4d4}: {color}{color:#4ec9b0}Chunk{color}{color:#d4d4d4}[{color}{color:#4ec9b0}BArr{color}{color:#d4d4d4}]){color}{color:#d4d4d4}:{color} {color:#4ec9b0}Chunk{color}{color:#d4d4d4}[{color}{color:#4ec9b0}ArrowStreamReader{color}{color:#d4d4d4}] {color}{color:#d4d4d4}={color} {color:#c586c0}for{color}{color:#d4d4d4} {{color} {color:#d4d4d4} arr {color}{color:#d4d4d4}<-{color}{color:#d4d4d4} din{color} {color:#d4d4d4} stream {color}{color:#d4d4d4}={color} {color:#569cd6}new{color} {color:#4ec9b0}ByteArrayInputStream{color}{color:#d4d4d4}(arr){color} {color:#d4d4d4} } {color}{color:#c586c0}yield{color} {color:#569cd6}new{color} {color:#4ec9b0}ArrowStreamReader{color}{color:#d4d4d4}(stream, allocator){color} {color:#d4d4d4}reader {color}{color:#d4d4d4}={color}{color:#d4d4d4} deserialize(arr){color} {color:#d4d4d4}schema {color}{color:#d4d4d4}={color}{color:#d4d4d4} reader.map(r {color}{color:#d4d4d4}=>{color}{color:#d4d4d4} r.getVectorSchemaRoot.getSchema){color} {color:#d4d4d4}empty {color}{color:#d4d4d4}={color}{color:#d4d4d4} reader.map(r {color}{color:#d4d4d4}=>{color}{color:#d4d4d4} r.loadNextBatch){color} Which fails with exception on both lines 2 and 3 in the last snippet: Fiber failed. An unchecked error was produced. java.io.IOException: Unexpected end of input. Missing schema. at org.apache.arrow.vector.ipc.ArrowStreamReader.readSchema(ArrowStreamReader.java:135) at org.apache.arrow.vector.ipc.ArrowReader.initialize(ArrowReader.java:178) at org.apache.arrow.vector.ipc.ArrowReader.ensureInitialized(ArrowReader.java:169) at org.apache.arrow.vector.ipc.ArrowReader.getVectorSchemaRoot(ArrowReader.java:62) at nettest.ArrowSpec.$anonfun$testConsumeArrow$7(Arrow.scala:96) at zio.Chunk$Arr.map(Chunk.scala:722) The full Scala code is [here|https://github.com/Clover-Group/zio-tsp/blob/46e34c7c060bf4061067922077bbe05ea4b9f301/src/test/scala/Arrow.scala#L95] How do I resolve that ? We both are using Arrow 0.14.1 and my colleague has no issues with PyArrow API. Thank you! -- This message was sent by Atlassian JIRA (v7.6.14#76016)