Eric Erhardt created ARROW-4997:
-----------------------------------

             Summary: [C#] ArrowStreamReader doesn't consume whole stream and 
doesn't implement sync read
                 Key: ARROW-4997
                 URL: https://issues.apache.org/jira/browse/ARROW-4997
             Project: Apache Arrow
          Issue Type: Bug
          Components: C#
            Reporter: Eric Erhardt
            Assignee: Eric Erhardt


There are 2 major issues with the ArrowStreamReader that are blocking me from 
using it.
 # When it reads a batch from a .NET Stream that doesn't return the whole chunk 
of memory in one "Read" call (like a socket/network stream), it only calls Read 
once, and then continues on. This is an issue because it has "garbage" at the 
end of its buffer (which was never written to by the stream), and when 
attempting to read the next batch, it is in the middle of the previous batch 
from the .NET Stream. This causes all sorts of issues because it assumes the 
next 4 bytes are the message length, which it obviously isn't. See [the reading 
code|https://github.com/apache/arrow/blob/13fd813445b4738cbebbd137490fe3c02071c04b/csharp/src/Apache.Arrow/Ipc/ArrowStreamReaderImplementation.cs#L90-L97]
 for where it only calls Read once - it should be in a loop.
 # ArrowStreamReader has a synchronous ReadNextRecordBatch() method - but it 
throws NotImplementedException. This is necessary when a caller isn't in an 
async method, they can't/shouldn't call the async API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to