[jira] [Updated] (ARROW-1692) [Python, Java] UnionArray round trip not working

2020-05-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-1692:
--
Labels: columnar-format-1.0 pull-request-available  (was: 
columnar-format-1.0)

> [Python, Java] UnionArray round trip not working
> 
>
> Key: ARROW-1692
> URL: https://issues.apache.org/jira/browse/ARROW-1692
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Integration, Java, Python
>Reporter: Philipp Moritz
>Assignee: Ryan Murray
>Priority: Blocker
>  Labels: columnar-format-1.0, pull-request-available
> Fix For: 1.0.0
>
> Attachments: union_array.arrow
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I'm currently working on making pyarrow.serialization data available from the 
> Java side, one problem I was running into is that it seems the Java 
> implementation cannot read UnionArrays generated from C++. To make this 
> easily reproducible I created a clean Python implementation for creating 
> UnionArrays: https://github.com/apache/arrow/pull/1216
> The data is generated with the following script:
> {code}
> import pyarrow as pa
> binary = pa.array([b'a', b'b', b'c', b'd'], type='binary')
> int64 = pa.array([1, 2, 3], type='int64')
> types = pa.array([0, 1, 0, 0, 1, 1, 0], type='int8')
> value_offsets = pa.array([0, 0, 2, 1, 1, 2, 3], type='int32')
> result = pa.UnionArray.from_arrays([binary, int64], types, value_offsets)
> batch = pa.RecordBatch.from_arrays([result], ["test"])
> sink = pa.BufferOutputStream()
> writer = pa.RecordBatchStreamWriter(sink, batch.schema)
> writer.write_batch(batch)
> sink.close()
> b = sink.get_result()
> with open("union_array.arrow", "wb") as f:
> f.write(b)
> # Sanity check: Read the batch in again
> with open("union_array.arrow", "rb") as f:
> b = f.read()
> reader = pa.RecordBatchStreamReader(pa.BufferReader(b))
> batch = reader.read_next_batch()
> print("union array is", batch.column(0))
> {code}
> I attached the file generated by that script. Then when I run the following 
> code in Java:
> {code}
> RootAllocator allocator = new RootAllocator(10);
> ByteArrayInputStream in = new 
> ByteArrayInputStream(Files.readAllBytes(Paths.get("union_array.arrow")));
> ArrowStreamReader reader = new ArrowStreamReader(in, allocator);
> reader.loadNextBatch()
> {code}
> I get the following error:
> {code}
> |  java.lang.IllegalArgumentException thrown: Could not load buffers for 
> field test: Union(Sparse, [22, 5])<0: Binary, 1: Int(64, true)>. error 
> message: can not truncate buffer to a larger size 7: 0
> |at VectorLoader.loadBuffers (VectorLoader.java:83)
> |at VectorLoader.load (VectorLoader.java:62)
> |at ArrowReader$1.visit (ArrowReader.java:125)
> |at ArrowReader$1.visit (ArrowReader.java:111)
> |at ArrowRecordBatch.accepts (ArrowRecordBatch.java:128)
> |at ArrowReader.loadNextBatch (ArrowReader.java:137)
> |at (#7:1)
> {code}
> It seems like Java is not picking up that the UnionArray is Dense instead of 
> Sparse. After changing the default in 
> java/vector/src/main/codegen/templates/UnionVector.java from Sparse to Dense, 
> I get this:
> {code}
> jshell> reader.getVectorSchemaRoot().getSchema()
> $9 ==> Schema [0])<: Int(64, true)>
> {code}
> but then reading doesn't work:
> {code}
> jshell> reader.loadNextBatch()
> |  java.lang.IllegalArgumentException thrown: Could not load buffers for 
> field list: Union(Dense, [1])<: Struct Int(64, true). error message: can not truncate buffer to a larger size 1: > 0
> |at VectorLoader.loadBuffers (VectorLoader.java:83)
> |at VectorLoader.load (VectorLoader.java:62)
> |at ArrowReader$1.visit (ArrowReader.java:125)
> |at ArrowReader$1.visit (ArrowReader.java:111)
> |at ArrowRecordBatch.accepts (ArrowRecordBatch.java:128)
> |at ArrowReader.loadNextBatch (ArrowReader.java:137)
> |at (#8:1)
> {code}
> Any help with this is appreciated!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-1692) [Python, Java] UnionArray round trip not working

2020-02-03 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-1692:
---
Component/s: Integration

> [Python, Java] UnionArray round trip not working
> 
>
> Key: ARROW-1692
> URL: https://issues.apache.org/jira/browse/ARROW-1692
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Integration, Java, Python
>Reporter: Philipp Moritz
>Priority: Blocker
>  Labels: columnar-format-1.0
> Fix For: 1.0.0
>
> Attachments: union_array.arrow
>
>
> I'm currently working on making pyarrow.serialization data available from the 
> Java side, one problem I was running into is that it seems the Java 
> implementation cannot read UnionArrays generated from C++. To make this 
> easily reproducible I created a clean Python implementation for creating 
> UnionArrays: https://github.com/apache/arrow/pull/1216
> The data is generated with the following script:
> {code}
> import pyarrow as pa
> binary = pa.array([b'a', b'b', b'c', b'd'], type='binary')
> int64 = pa.array([1, 2, 3], type='int64')
> types = pa.array([0, 1, 0, 0, 1, 1, 0], type='int8')
> value_offsets = pa.array([0, 0, 2, 1, 1, 2, 3], type='int32')
> result = pa.UnionArray.from_arrays([binary, int64], types, value_offsets)
> batch = pa.RecordBatch.from_arrays([result], ["test"])
> sink = pa.BufferOutputStream()
> writer = pa.RecordBatchStreamWriter(sink, batch.schema)
> writer.write_batch(batch)
> sink.close()
> b = sink.get_result()
> with open("union_array.arrow", "wb") as f:
> f.write(b)
> # Sanity check: Read the batch in again
> with open("union_array.arrow", "rb") as f:
> b = f.read()
> reader = pa.RecordBatchStreamReader(pa.BufferReader(b))
> batch = reader.read_next_batch()
> print("union array is", batch.column(0))
> {code}
> I attached the file generated by that script. Then when I run the following 
> code in Java:
> {code}
> RootAllocator allocator = new RootAllocator(10);
> ByteArrayInputStream in = new 
> ByteArrayInputStream(Files.readAllBytes(Paths.get("union_array.arrow")));
> ArrowStreamReader reader = new ArrowStreamReader(in, allocator);
> reader.loadNextBatch()
> {code}
> I get the following error:
> {code}
> |  java.lang.IllegalArgumentException thrown: Could not load buffers for 
> field test: Union(Sparse, [22, 5])<0: Binary, 1: Int(64, true)>. error 
> message: can not truncate buffer to a larger size 7: 0
> |at VectorLoader.loadBuffers (VectorLoader.java:83)
> |at VectorLoader.load (VectorLoader.java:62)
> |at ArrowReader$1.visit (ArrowReader.java:125)
> |at ArrowReader$1.visit (ArrowReader.java:111)
> |at ArrowRecordBatch.accepts (ArrowRecordBatch.java:128)
> |at ArrowReader.loadNextBatch (ArrowReader.java:137)
> |at (#7:1)
> {code}
> It seems like Java is not picking up that the UnionArray is Dense instead of 
> Sparse. After changing the default in 
> java/vector/src/main/codegen/templates/UnionVector.java from Sparse to Dense, 
> I get this:
> {code}
> jshell> reader.getVectorSchemaRoot().getSchema()
> $9 ==> Schema [0])<: Int(64, true)>
> {code}
> but then reading doesn't work:
> {code}
> jshell> reader.loadNextBatch()
> |  java.lang.IllegalArgumentException thrown: Could not load buffers for 
> field list: Union(Dense, [1])<: Struct Int(64, true). error message: can not truncate buffer to a larger size 1: > 0
> |at VectorLoader.loadBuffers (VectorLoader.java:83)
> |at VectorLoader.load (VectorLoader.java:62)
> |at ArrowReader$1.visit (ArrowReader.java:125)
> |at ArrowReader$1.visit (ArrowReader.java:111)
> |at ArrowRecordBatch.accepts (ArrowRecordBatch.java:128)
> |at ArrowReader.loadNextBatch (ArrowReader.java:137)
> |at (#8:1)
> {code}
> Any help with this is appreciated!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-1692) [Python, Java] UnionArray round trip not working

2020-01-07 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1692:

Fix Version/s: (was: 0.16.0)
   1.0.0

> [Python, Java] UnionArray round trip not working
> 
>
> Key: ARROW-1692
> URL: https://issues.apache.org/jira/browse/ARROW-1692
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java, Python
>Reporter: Philipp Moritz
>Assignee: Micah Kornfield
>Priority: Blocker
>  Labels: columnar-format-1.0
> Fix For: 1.0.0
>
> Attachments: union_array.arrow
>
>
> I'm currently working on making pyarrow.serialization data available from the 
> Java side, one problem I was running into is that it seems the Java 
> implementation cannot read UnionArrays generated from C++. To make this 
> easily reproducible I created a clean Python implementation for creating 
> UnionArrays: https://github.com/apache/arrow/pull/1216
> The data is generated with the following script:
> {code}
> import pyarrow as pa
> binary = pa.array([b'a', b'b', b'c', b'd'], type='binary')
> int64 = pa.array([1, 2, 3], type='int64')
> types = pa.array([0, 1, 0, 0, 1, 1, 0], type='int8')
> value_offsets = pa.array([0, 0, 2, 1, 1, 2, 3], type='int32')
> result = pa.UnionArray.from_arrays([binary, int64], types, value_offsets)
> batch = pa.RecordBatch.from_arrays([result], ["test"])
> sink = pa.BufferOutputStream()
> writer = pa.RecordBatchStreamWriter(sink, batch.schema)
> writer.write_batch(batch)
> sink.close()
> b = sink.get_result()
> with open("union_array.arrow", "wb") as f:
> f.write(b)
> # Sanity check: Read the batch in again
> with open("union_array.arrow", "rb") as f:
> b = f.read()
> reader = pa.RecordBatchStreamReader(pa.BufferReader(b))
> batch = reader.read_next_batch()
> print("union array is", batch.column(0))
> {code}
> I attached the file generated by that script. Then when I run the following 
> code in Java:
> {code}
> RootAllocator allocator = new RootAllocator(10);
> ByteArrayInputStream in = new 
> ByteArrayInputStream(Files.readAllBytes(Paths.get("union_array.arrow")));
> ArrowStreamReader reader = new ArrowStreamReader(in, allocator);
> reader.loadNextBatch()
> {code}
> I get the following error:
> {code}
> |  java.lang.IllegalArgumentException thrown: Could not load buffers for 
> field test: Union(Sparse, [22, 5])<0: Binary, 1: Int(64, true)>. error 
> message: can not truncate buffer to a larger size 7: 0
> |at VectorLoader.loadBuffers (VectorLoader.java:83)
> |at VectorLoader.load (VectorLoader.java:62)
> |at ArrowReader$1.visit (ArrowReader.java:125)
> |at ArrowReader$1.visit (ArrowReader.java:111)
> |at ArrowRecordBatch.accepts (ArrowRecordBatch.java:128)
> |at ArrowReader.loadNextBatch (ArrowReader.java:137)
> |at (#7:1)
> {code}
> It seems like Java is not picking up that the UnionArray is Dense instead of 
> Sparse. After changing the default in 
> java/vector/src/main/codegen/templates/UnionVector.java from Sparse to Dense, 
> I get this:
> {code}
> jshell> reader.getVectorSchemaRoot().getSchema()
> $9 ==> Schema [0])<: Int(64, true)>
> {code}
> but then reading doesn't work:
> {code}
> jshell> reader.loadNextBatch()
> |  java.lang.IllegalArgumentException thrown: Could not load buffers for 
> field list: Union(Dense, [1])<: Struct Int(64, true). error message: can not truncate buffer to a larger size 1: > 0
> |at VectorLoader.loadBuffers (VectorLoader.java:83)
> |at VectorLoader.load (VectorLoader.java:62)
> |at ArrowReader$1.visit (ArrowReader.java:125)
> |at ArrowReader$1.visit (ArrowReader.java:111)
> |at ArrowRecordBatch.accepts (ArrowRecordBatch.java:128)
> |at ArrowReader.loadNextBatch (ArrowReader.java:137)
> |at (#8:1)
> {code}
> Any help with this is appreciated!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-1692) [Python, Java] UnionArray round trip not working

2020-01-06 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-1692:
---
Priority: Blocker  (was: Major)

> [Python, Java] UnionArray round trip not working
> 
>
> Key: ARROW-1692
> URL: https://issues.apache.org/jira/browse/ARROW-1692
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java, Python
>Reporter: Philipp Moritz
>Assignee: Micah Kornfield
>Priority: Blocker
>  Labels: columnar-format-1.0
> Fix For: 1.0.0
>
> Attachments: union_array.arrow
>
>
> I'm currently working on making pyarrow.serialization data available from the 
> Java side, one problem I was running into is that it seems the Java 
> implementation cannot read UnionArrays generated from C++. To make this 
> easily reproducible I created a clean Python implementation for creating 
> UnionArrays: https://github.com/apache/arrow/pull/1216
> The data is generated with the following script:
> {code}
> import pyarrow as pa
> binary = pa.array([b'a', b'b', b'c', b'd'], type='binary')
> int64 = pa.array([1, 2, 3], type='int64')
> types = pa.array([0, 1, 0, 0, 1, 1, 0], type='int8')
> value_offsets = pa.array([0, 0, 2, 1, 1, 2, 3], type='int32')
> result = pa.UnionArray.from_arrays([binary, int64], types, value_offsets)
> batch = pa.RecordBatch.from_arrays([result], ["test"])
> sink = pa.BufferOutputStream()
> writer = pa.RecordBatchStreamWriter(sink, batch.schema)
> writer.write_batch(batch)
> sink.close()
> b = sink.get_result()
> with open("union_array.arrow", "wb") as f:
> f.write(b)
> # Sanity check: Read the batch in again
> with open("union_array.arrow", "rb") as f:
> b = f.read()
> reader = pa.RecordBatchStreamReader(pa.BufferReader(b))
> batch = reader.read_next_batch()
> print("union array is", batch.column(0))
> {code}
> I attached the file generated by that script. Then when I run the following 
> code in Java:
> {code}
> RootAllocator allocator = new RootAllocator(10);
> ByteArrayInputStream in = new 
> ByteArrayInputStream(Files.readAllBytes(Paths.get("union_array.arrow")));
> ArrowStreamReader reader = new ArrowStreamReader(in, allocator);
> reader.loadNextBatch()
> {code}
> I get the following error:
> {code}
> |  java.lang.IllegalArgumentException thrown: Could not load buffers for 
> field test: Union(Sparse, [22, 5])<0: Binary, 1: Int(64, true)>. error 
> message: can not truncate buffer to a larger size 7: 0
> |at VectorLoader.loadBuffers (VectorLoader.java:83)
> |at VectorLoader.load (VectorLoader.java:62)
> |at ArrowReader$1.visit (ArrowReader.java:125)
> |at ArrowReader$1.visit (ArrowReader.java:111)
> |at ArrowRecordBatch.accepts (ArrowRecordBatch.java:128)
> |at ArrowReader.loadNextBatch (ArrowReader.java:137)
> |at (#7:1)
> {code}
> It seems like Java is not picking up that the UnionArray is Dense instead of 
> Sparse. After changing the default in 
> java/vector/src/main/codegen/templates/UnionVector.java from Sparse to Dense, 
> I get this:
> {code}
> jshell> reader.getVectorSchemaRoot().getSchema()
> $9 ==> Schema [0])<: Int(64, true)>
> {code}
> but then reading doesn't work:
> {code}
> jshell> reader.loadNextBatch()
> |  java.lang.IllegalArgumentException thrown: Could not load buffers for 
> field list: Union(Dense, [1])<: Struct Int(64, true). error message: can not truncate buffer to a larger size 1: > 0
> |at VectorLoader.loadBuffers (VectorLoader.java:83)
> |at VectorLoader.load (VectorLoader.java:62)
> |at ArrowReader$1.visit (ArrowReader.java:125)
> |at ArrowReader$1.visit (ArrowReader.java:111)
> |at ArrowRecordBatch.accepts (ArrowRecordBatch.java:128)
> |at ArrowReader.loadNextBatch (ArrowReader.java:137)
> |at (#8:1)
> {code}
> Any help with this is appreciated!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-1692) [Python, Java] UnionArray round trip not working

2019-06-11 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1692:

Fix Version/s: (was: 0.14.0)
   1.0.0

> [Python, Java] UnionArray round trip not working
> 
>
> Key: ARROW-1692
> URL: https://issues.apache.org/jira/browse/ARROW-1692
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java, Python
>Reporter: Philipp Moritz
>Assignee: Micah Kornfield
>Priority: Major
>  Labels: columnar-format-1.0
> Fix For: 1.0.0
>
> Attachments: union_array.arrow
>
>
> I'm currently working on making pyarrow.serialization data available from the 
> Java side, one problem I was running into is that it seems the Java 
> implementation cannot read UnionArrays generated from C++. To make this 
> easily reproducible I created a clean Python implementation for creating 
> UnionArrays: https://github.com/apache/arrow/pull/1216
> The data is generated with the following script:
> {code}
> import pyarrow as pa
> binary = pa.array([b'a', b'b', b'c', b'd'], type='binary')
> int64 = pa.array([1, 2, 3], type='int64')
> types = pa.array([0, 1, 0, 0, 1, 1, 0], type='int8')
> value_offsets = pa.array([0, 0, 2, 1, 1, 2, 3], type='int32')
> result = pa.UnionArray.from_arrays([binary, int64], types, value_offsets)
> batch = pa.RecordBatch.from_arrays([result], ["test"])
> sink = pa.BufferOutputStream()
> writer = pa.RecordBatchStreamWriter(sink, batch.schema)
> writer.write_batch(batch)
> sink.close()
> b = sink.get_result()
> with open("union_array.arrow", "wb") as f:
> f.write(b)
> # Sanity check: Read the batch in again
> with open("union_array.arrow", "rb") as f:
> b = f.read()
> reader = pa.RecordBatchStreamReader(pa.BufferReader(b))
> batch = reader.read_next_batch()
> print("union array is", batch.column(0))
> {code}
> I attached the file generated by that script. Then when I run the following 
> code in Java:
> {code}
> RootAllocator allocator = new RootAllocator(10);
> ByteArrayInputStream in = new 
> ByteArrayInputStream(Files.readAllBytes(Paths.get("union_array.arrow")));
> ArrowStreamReader reader = new ArrowStreamReader(in, allocator);
> reader.loadNextBatch()
> {code}
> I get the following error:
> {code}
> |  java.lang.IllegalArgumentException thrown: Could not load buffers for 
> field test: Union(Sparse, [22, 5])<0: Binary, 1: Int(64, true)>. error 
> message: can not truncate buffer to a larger size 7: 0
> |at VectorLoader.loadBuffers (VectorLoader.java:83)
> |at VectorLoader.load (VectorLoader.java:62)
> |at ArrowReader$1.visit (ArrowReader.java:125)
> |at ArrowReader$1.visit (ArrowReader.java:111)
> |at ArrowRecordBatch.accepts (ArrowRecordBatch.java:128)
> |at ArrowReader.loadNextBatch (ArrowReader.java:137)
> |at (#7:1)
> {code}
> It seems like Java is not picking up that the UnionArray is Dense instead of 
> Sparse. After changing the default in 
> java/vector/src/main/codegen/templates/UnionVector.java from Sparse to Dense, 
> I get this:
> {code}
> jshell> reader.getVectorSchemaRoot().getSchema()
> $9 ==> Schema [0])<: Int(64, true)>
> {code}
> but then reading doesn't work:
> {code}
> jshell> reader.loadNextBatch()
> |  java.lang.IllegalArgumentException thrown: Could not load buffers for 
> field list: Union(Dense, [1])<: Struct Int(64, true). error message: can not truncate buffer to a larger size 1: > 0
> |at VectorLoader.loadBuffers (VectorLoader.java:83)
> |at VectorLoader.load (VectorLoader.java:62)
> |at ArrowReader$1.visit (ArrowReader.java:125)
> |at ArrowReader$1.visit (ArrowReader.java:111)
> |at ArrowRecordBatch.accepts (ArrowRecordBatch.java:128)
> |at ArrowReader.loadNextBatch (ArrowReader.java:137)
> |at (#8:1)
> {code}
> Any help with this is appreciated!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1692) [Python, Java] UnionArray round trip not working

2019-04-17 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-1692:
--
Component/s: Python
 Java

> [Python, Java] UnionArray round trip not working
> 
>
> Key: ARROW-1692
> URL: https://issues.apache.org/jira/browse/ARROW-1692
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java, Python
>Reporter: Philipp Moritz
>Assignee: Micah Kornfield
>Priority: Major
>  Labels: columnar-format-1.0
> Fix For: 0.14.0
>
> Attachments: union_array.arrow
>
>
> I'm currently working on making pyarrow.serialization data available from the 
> Java side, one problem I was running into is that it seems the Java 
> implementation cannot read UnionArrays generated from C++. To make this 
> easily reproducible I created a clean Python implementation for creating 
> UnionArrays: https://github.com/apache/arrow/pull/1216
> The data is generated with the following script:
> {code}
> import pyarrow as pa
> binary = pa.array([b'a', b'b', b'c', b'd'], type='binary')
> int64 = pa.array([1, 2, 3], type='int64')
> types = pa.array([0, 1, 0, 0, 1, 1, 0], type='int8')
> value_offsets = pa.array([0, 0, 2, 1, 1, 2, 3], type='int32')
> result = pa.UnionArray.from_arrays([binary, int64], types, value_offsets)
> batch = pa.RecordBatch.from_arrays([result], ["test"])
> sink = pa.BufferOutputStream()
> writer = pa.RecordBatchStreamWriter(sink, batch.schema)
> writer.write_batch(batch)
> sink.close()
> b = sink.get_result()
> with open("union_array.arrow", "wb") as f:
> f.write(b)
> # Sanity check: Read the batch in again
> with open("union_array.arrow", "rb") as f:
> b = f.read()
> reader = pa.RecordBatchStreamReader(pa.BufferReader(b))
> batch = reader.read_next_batch()
> print("union array is", batch.column(0))
> {code}
> I attached the file generated by that script. Then when I run the following 
> code in Java:
> {code}
> RootAllocator allocator = new RootAllocator(10);
> ByteArrayInputStream in = new 
> ByteArrayInputStream(Files.readAllBytes(Paths.get("union_array.arrow")));
> ArrowStreamReader reader = new ArrowStreamReader(in, allocator);
> reader.loadNextBatch()
> {code}
> I get the following error:
> {code}
> |  java.lang.IllegalArgumentException thrown: Could not load buffers for 
> field test: Union(Sparse, [22, 5])<0: Binary, 1: Int(64, true)>. error 
> message: can not truncate buffer to a larger size 7: 0
> |at VectorLoader.loadBuffers (VectorLoader.java:83)
> |at VectorLoader.load (VectorLoader.java:62)
> |at ArrowReader$1.visit (ArrowReader.java:125)
> |at ArrowReader$1.visit (ArrowReader.java:111)
> |at ArrowRecordBatch.accepts (ArrowRecordBatch.java:128)
> |at ArrowReader.loadNextBatch (ArrowReader.java:137)
> |at (#7:1)
> {code}
> It seems like Java is not picking up that the UnionArray is Dense instead of 
> Sparse. After changing the default in 
> java/vector/src/main/codegen/templates/UnionVector.java from Sparse to Dense, 
> I get this:
> {code}
> jshell> reader.getVectorSchemaRoot().getSchema()
> $9 ==> Schema [0])<: Int(64, true)>
> {code}
> but then reading doesn't work:
> {code}
> jshell> reader.loadNextBatch()
> |  java.lang.IllegalArgumentException thrown: Could not load buffers for 
> field list: Union(Dense, [1])<: Struct Int(64, true). error message: can not truncate buffer to a larger size 1: > 0
> |at VectorLoader.loadBuffers (VectorLoader.java:83)
> |at VectorLoader.load (VectorLoader.java:62)
> |at ArrowReader$1.visit (ArrowReader.java:125)
> |at ArrowReader$1.visit (ArrowReader.java:111)
> |at ArrowRecordBatch.accepts (ArrowRecordBatch.java:128)
> |at ArrowReader.loadNextBatch (ArrowReader.java:137)
> |at (#8:1)
> {code}
> Any help with this is appreciated!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1692) [Python, Java] UnionArray round trip not working

2019-02-05 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1692:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Python, Java] UnionArray round trip not working
> 
>
> Key: ARROW-1692
> URL: https://issues.apache.org/jira/browse/ARROW-1692
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Philipp Moritz
>Priority: Major
>  Labels: columnar-format-1.0
> Fix For: 0.14.0
>
> Attachments: union_array.arrow
>
>
> I'm currently working on making pyarrow.serialization data available from the 
> Java side, one problem I was running into is that it seems the Java 
> implementation cannot read UnionArrays generated from C++. To make this 
> easily reproducible I created a clean Python implementation for creating 
> UnionArrays: https://github.com/apache/arrow/pull/1216
> The data is generated with the following script:
> {code}
> import pyarrow as pa
> binary = pa.array([b'a', b'b', b'c', b'd'], type='binary')
> int64 = pa.array([1, 2, 3], type='int64')
> types = pa.array([0, 1, 0, 0, 1, 1, 0], type='int8')
> value_offsets = pa.array([0, 0, 2, 1, 1, 2, 3], type='int32')
> result = pa.UnionArray.from_arrays([binary, int64], types, value_offsets)
> batch = pa.RecordBatch.from_arrays([result], ["test"])
> sink = pa.BufferOutputStream()
> writer = pa.RecordBatchStreamWriter(sink, batch.schema)
> writer.write_batch(batch)
> sink.close()
> b = sink.get_result()
> with open("union_array.arrow", "wb") as f:
> f.write(b)
> # Sanity check: Read the batch in again
> with open("union_array.arrow", "rb") as f:
> b = f.read()
> reader = pa.RecordBatchStreamReader(pa.BufferReader(b))
> batch = reader.read_next_batch()
> print("union array is", batch.column(0))
> {code}
> I attached the file generated by that script. Then when I run the following 
> code in Java:
> {code}
> RootAllocator allocator = new RootAllocator(10);
> ByteArrayInputStream in = new 
> ByteArrayInputStream(Files.readAllBytes(Paths.get("union_array.arrow")));
> ArrowStreamReader reader = new ArrowStreamReader(in, allocator);
> reader.loadNextBatch()
> {code}
> I get the following error:
> {code}
> |  java.lang.IllegalArgumentException thrown: Could not load buffers for 
> field test: Union(Sparse, [22, 5])<0: Binary, 1: Int(64, true)>. error 
> message: can not truncate buffer to a larger size 7: 0
> |at VectorLoader.loadBuffers (VectorLoader.java:83)
> |at VectorLoader.load (VectorLoader.java:62)
> |at ArrowReader$1.visit (ArrowReader.java:125)
> |at ArrowReader$1.visit (ArrowReader.java:111)
> |at ArrowRecordBatch.accepts (ArrowRecordBatch.java:128)
> |at ArrowReader.loadNextBatch (ArrowReader.java:137)
> |at (#7:1)
> {code}
> It seems like Java is not picking up that the UnionArray is Dense instead of 
> Sparse. After changing the default in 
> java/vector/src/main/codegen/templates/UnionVector.java from Sparse to Dense, 
> I get this:
> {code}
> jshell> reader.getVectorSchemaRoot().getSchema()
> $9 ==> Schema [0])<: Int(64, true)>
> {code}
> but then reading doesn't work:
> {code}
> jshell> reader.loadNextBatch()
> |  java.lang.IllegalArgumentException thrown: Could not load buffers for 
> field list: Union(Dense, [1])<: Struct Int(64, true). error message: can not truncate buffer to a larger size 1: > 0
> |at VectorLoader.loadBuffers (VectorLoader.java:83)
> |at VectorLoader.load (VectorLoader.java:62)
> |at ArrowReader$1.visit (ArrowReader.java:125)
> |at ArrowReader$1.visit (ArrowReader.java:111)
> |at ArrowRecordBatch.accepts (ArrowRecordBatch.java:128)
> |at ArrowReader.loadNextBatch (ArrowReader.java:137)
> |at (#8:1)
> {code}
> Any help with this is appreciated!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1692) [Python, Java] UnionArray round trip not working

2018-11-14 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1692:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [Python, Java] UnionArray round trip not working
> 
>
> Key: ARROW-1692
> URL: https://issues.apache.org/jira/browse/ARROW-1692
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Philipp Moritz
>Priority: Major
>  Labels: columnar-format-1.0
> Fix For: 0.13.0
>
> Attachments: union_array.arrow
>
>
> I'm currently working on making pyarrow.serialization data available from the 
> Java side, one problem I was running into is that it seems the Java 
> implementation cannot read UnionArrays generated from C++. To make this 
> easily reproducible I created a clean Python implementation for creating 
> UnionArrays: https://github.com/apache/arrow/pull/1216
> The data is generated with the following script:
> {code}
> import pyarrow as pa
> binary = pa.array([b'a', b'b', b'c', b'd'], type='binary')
> int64 = pa.array([1, 2, 3], type='int64')
> types = pa.array([0, 1, 0, 0, 1, 1, 0], type='int8')
> value_offsets = pa.array([0, 0, 2, 1, 1, 2, 3], type='int32')
> result = pa.UnionArray.from_arrays([binary, int64], types, value_offsets)
> batch = pa.RecordBatch.from_arrays([result], ["test"])
> sink = pa.BufferOutputStream()
> writer = pa.RecordBatchStreamWriter(sink, batch.schema)
> writer.write_batch(batch)
> sink.close()
> b = sink.get_result()
> with open("union_array.arrow", "wb") as f:
> f.write(b)
> # Sanity check: Read the batch in again
> with open("union_array.arrow", "rb") as f:
> b = f.read()
> reader = pa.RecordBatchStreamReader(pa.BufferReader(b))
> batch = reader.read_next_batch()
> print("union array is", batch.column(0))
> {code}
> I attached the file generated by that script. Then when I run the following 
> code in Java:
> {code}
> RootAllocator allocator = new RootAllocator(10);
> ByteArrayInputStream in = new 
> ByteArrayInputStream(Files.readAllBytes(Paths.get("union_array.arrow")));
> ArrowStreamReader reader = new ArrowStreamReader(in, allocator);
> reader.loadNextBatch()
> {code}
> I get the following error:
> {code}
> |  java.lang.IllegalArgumentException thrown: Could not load buffers for 
> field test: Union(Sparse, [22, 5])<0: Binary, 1: Int(64, true)>. error 
> message: can not truncate buffer to a larger size 7: 0
> |at VectorLoader.loadBuffers (VectorLoader.java:83)
> |at VectorLoader.load (VectorLoader.java:62)
> |at ArrowReader$1.visit (ArrowReader.java:125)
> |at ArrowReader$1.visit (ArrowReader.java:111)
> |at ArrowRecordBatch.accepts (ArrowRecordBatch.java:128)
> |at ArrowReader.loadNextBatch (ArrowReader.java:137)
> |at (#7:1)
> {code}
> It seems like Java is not picking up that the UnionArray is Dense instead of 
> Sparse. After changing the default in 
> java/vector/src/main/codegen/templates/UnionVector.java from Sparse to Dense, 
> I get this:
> {code}
> jshell> reader.getVectorSchemaRoot().getSchema()
> $9 ==> Schema [0])<: Int(64, true)>
> {code}
> but then reading doesn't work:
> {code}
> jshell> reader.loadNextBatch()
> |  java.lang.IllegalArgumentException thrown: Could not load buffers for 
> field list: Union(Dense, [1])<: Struct Int(64, true). error message: can not truncate buffer to a larger size 1: > 0
> |at VectorLoader.loadBuffers (VectorLoader.java:83)
> |at VectorLoader.load (VectorLoader.java:62)
> |at ArrowReader$1.visit (ArrowReader.java:125)
> |at ArrowReader$1.visit (ArrowReader.java:111)
> |at ArrowRecordBatch.accepts (ArrowRecordBatch.java:128)
> |at ArrowReader.loadNextBatch (ArrowReader.java:137)
> |at (#8:1)
> {code}
> Any help with this is appreciated!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1692) [Python, Java] UnionArray round trip not working

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-1692:
---
Fix Version/s: (was: 0.11.0)
   0.12.0

> [Python, Java] UnionArray round trip not working
> 
>
> Key: ARROW-1692
> URL: https://issues.apache.org/jira/browse/ARROW-1692
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Philipp Moritz
>Priority: Major
>  Labels: columnar-format-1.0
> Fix For: 0.12.0
>
> Attachments: union_array.arrow
>
>
> I'm currently working on making pyarrow.serialization data available from the 
> Java side, one problem I was running into is that it seems the Java 
> implementation cannot read UnionArrays generated from C++. To make this 
> easily reproducible I created a clean Python implementation for creating 
> UnionArrays: https://github.com/apache/arrow/pull/1216
> The data is generated with the following script:
> {code}
> import pyarrow as pa
> binary = pa.array([b'a', b'b', b'c', b'd'], type='binary')
> int64 = pa.array([1, 2, 3], type='int64')
> types = pa.array([0, 1, 0, 0, 1, 1, 0], type='int8')
> value_offsets = pa.array([0, 0, 2, 1, 1, 2, 3], type='int32')
> result = pa.UnionArray.from_arrays([binary, int64], types, value_offsets)
> batch = pa.RecordBatch.from_arrays([result], ["test"])
> sink = pa.BufferOutputStream()
> writer = pa.RecordBatchStreamWriter(sink, batch.schema)
> writer.write_batch(batch)
> sink.close()
> b = sink.get_result()
> with open("union_array.arrow", "wb") as f:
> f.write(b)
> # Sanity check: Read the batch in again
> with open("union_array.arrow", "rb") as f:
> b = f.read()
> reader = pa.RecordBatchStreamReader(pa.BufferReader(b))
> batch = reader.read_next_batch()
> print("union array is", batch.column(0))
> {code}
> I attached the file generated by that script. Then when I run the following 
> code in Java:
> {code}
> RootAllocator allocator = new RootAllocator(10);
> ByteArrayInputStream in = new 
> ByteArrayInputStream(Files.readAllBytes(Paths.get("union_array.arrow")));
> ArrowStreamReader reader = new ArrowStreamReader(in, allocator);
> reader.loadNextBatch()
> {code}
> I get the following error:
> {code}
> |  java.lang.IllegalArgumentException thrown: Could not load buffers for 
> field test: Union(Sparse, [22, 5])<0: Binary, 1: Int(64, true)>. error 
> message: can not truncate buffer to a larger size 7: 0
> |at VectorLoader.loadBuffers (VectorLoader.java:83)
> |at VectorLoader.load (VectorLoader.java:62)
> |at ArrowReader$1.visit (ArrowReader.java:125)
> |at ArrowReader$1.visit (ArrowReader.java:111)
> |at ArrowRecordBatch.accepts (ArrowRecordBatch.java:128)
> |at ArrowReader.loadNextBatch (ArrowReader.java:137)
> |at (#7:1)
> {code}
> It seems like Java is not picking up that the UnionArray is Dense instead of 
> Sparse. After changing the default in 
> java/vector/src/main/codegen/templates/UnionVector.java from Sparse to Dense, 
> I get this:
> {code}
> jshell> reader.getVectorSchemaRoot().getSchema()
> $9 ==> Schema [0])<: Int(64, true)>
> {code}
> but then reading doesn't work:
> {code}
> jshell> reader.loadNextBatch()
> |  java.lang.IllegalArgumentException thrown: Could not load buffers for 
> field list: Union(Dense, [1])<: Struct Int(64, true). error message: can not truncate buffer to a larger size 1: > 0
> |at VectorLoader.loadBuffers (VectorLoader.java:83)
> |at VectorLoader.load (VectorLoader.java:62)
> |at ArrowReader$1.visit (ArrowReader.java:125)
> |at ArrowReader$1.visit (ArrowReader.java:111)
> |at ArrowRecordBatch.accepts (ArrowRecordBatch.java:128)
> |at ArrowReader.loadNextBatch (ArrowReader.java:137)
> |at (#8:1)
> {code}
> Any help with this is appreciated!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1692) [Python, Java] UnionArray round trip not working

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1692:

Fix Version/s: (was: 0.10.0)
   0.11.0

> [Python, Java] UnionArray round trip not working
> 
>
> Key: ARROW-1692
> URL: https://issues.apache.org/jira/browse/ARROW-1692
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Philipp Moritz
>Priority: Major
>  Labels: columnar-format-1.0
> Fix For: 0.11.0
>
> Attachments: union_array.arrow
>
>
> I'm currently working on making pyarrow.serialization data available from the 
> Java side, one problem I was running into is that it seems the Java 
> implementation cannot read UnionArrays generated from C++. To make this 
> easily reproducible I created a clean Python implementation for creating 
> UnionArrays: https://github.com/apache/arrow/pull/1216
> The data is generated with the following script:
> {code}
> import pyarrow as pa
> binary = pa.array([b'a', b'b', b'c', b'd'], type='binary')
> int64 = pa.array([1, 2, 3], type='int64')
> types = pa.array([0, 1, 0, 0, 1, 1, 0], type='int8')
> value_offsets = pa.array([0, 0, 2, 1, 1, 2, 3], type='int32')
> result = pa.UnionArray.from_arrays([binary, int64], types, value_offsets)
> batch = pa.RecordBatch.from_arrays([result], ["test"])
> sink = pa.BufferOutputStream()
> writer = pa.RecordBatchStreamWriter(sink, batch.schema)
> writer.write_batch(batch)
> sink.close()
> b = sink.get_result()
> with open("union_array.arrow", "wb") as f:
> f.write(b)
> # Sanity check: Read the batch in again
> with open("union_array.arrow", "rb") as f:
> b = f.read()
> reader = pa.RecordBatchStreamReader(pa.BufferReader(b))
> batch = reader.read_next_batch()
> print("union array is", batch.column(0))
> {code}
> I attached the file generated by that script. Then when I run the following 
> code in Java:
> {code}
> RootAllocator allocator = new RootAllocator(10);
> ByteArrayInputStream in = new 
> ByteArrayInputStream(Files.readAllBytes(Paths.get("union_array.arrow")));
> ArrowStreamReader reader = new ArrowStreamReader(in, allocator);
> reader.loadNextBatch()
> {code}
> I get the following error:
> {code}
> |  java.lang.IllegalArgumentException thrown: Could not load buffers for 
> field test: Union(Sparse, [22, 5])<0: Binary, 1: Int(64, true)>. error 
> message: can not truncate buffer to a larger size 7: 0
> |at VectorLoader.loadBuffers (VectorLoader.java:83)
> |at VectorLoader.load (VectorLoader.java:62)
> |at ArrowReader$1.visit (ArrowReader.java:125)
> |at ArrowReader$1.visit (ArrowReader.java:111)
> |at ArrowRecordBatch.accepts (ArrowRecordBatch.java:128)
> |at ArrowReader.loadNextBatch (ArrowReader.java:137)
> |at (#7:1)
> {code}
> It seems like Java is not picking up that the UnionArray is Dense instead of 
> Sparse. After changing the default in 
> java/vector/src/main/codegen/templates/UnionVector.java from Sparse to Dense, 
> I get this:
> {code}
> jshell> reader.getVectorSchemaRoot().getSchema()
> $9 ==> Schema [0])<: Int(64, true)>
> {code}
> but then reading doesn't work:
> {code}
> jshell> reader.loadNextBatch()
> |  java.lang.IllegalArgumentException thrown: Could not load buffers for 
> field list: Union(Dense, [1])<: Struct Int(64, true). error message: can not truncate buffer to a larger size 1: > 0
> |at VectorLoader.loadBuffers (VectorLoader.java:83)
> |at VectorLoader.load (VectorLoader.java:62)
> |at ArrowReader$1.visit (ArrowReader.java:125)
> |at ArrowReader$1.visit (ArrowReader.java:111)
> |at ArrowRecordBatch.accepts (ArrowRecordBatch.java:128)
> |at ArrowReader.loadNextBatch (ArrowReader.java:137)
> |at (#8:1)
> {code}
> Any help with this is appreciated!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1692) [Python, Java] UnionArray round trip not working

2018-06-29 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1692:

Labels: columnar-format-1.0  (was: )

> [Python, Java] UnionArray round trip not working
> 
>
> Key: ARROW-1692
> URL: https://issues.apache.org/jira/browse/ARROW-1692
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Philipp Moritz
>Priority: Major
>  Labels: columnar-format-1.0
> Fix For: 0.11.0
>
> Attachments: union_array.arrow
>
>
> I'm currently working on making pyarrow.serialization data available from the 
> Java side, one problem I was running into is that it seems the Java 
> implementation cannot read UnionArrays generated from C++. To make this 
> easily reproducible I created a clean Python implementation for creating 
> UnionArrays: https://github.com/apache/arrow/pull/1216
> The data is generated with the following script:
> {code}
> import pyarrow as pa
> binary = pa.array([b'a', b'b', b'c', b'd'], type='binary')
> int64 = pa.array([1, 2, 3], type='int64')
> types = pa.array([0, 1, 0, 0, 1, 1, 0], type='int8')
> value_offsets = pa.array([0, 0, 2, 1, 1, 2, 3], type='int32')
> result = pa.UnionArray.from_arrays([binary, int64], types, value_offsets)
> batch = pa.RecordBatch.from_arrays([result], ["test"])
> sink = pa.BufferOutputStream()
> writer = pa.RecordBatchStreamWriter(sink, batch.schema)
> writer.write_batch(batch)
> sink.close()
> b = sink.get_result()
> with open("union_array.arrow", "wb") as f:
> f.write(b)
> # Sanity check: Read the batch in again
> with open("union_array.arrow", "rb") as f:
> b = f.read()
> reader = pa.RecordBatchStreamReader(pa.BufferReader(b))
> batch = reader.read_next_batch()
> print("union array is", batch.column(0))
> {code}
> I attached the file generated by that script. Then when I run the following 
> code in Java:
> {code}
> RootAllocator allocator = new RootAllocator(10);
> ByteArrayInputStream in = new 
> ByteArrayInputStream(Files.readAllBytes(Paths.get("union_array.arrow")));
> ArrowStreamReader reader = new ArrowStreamReader(in, allocator);
> reader.loadNextBatch()
> {code}
> I get the following error:
> {code}
> |  java.lang.IllegalArgumentException thrown: Could not load buffers for 
> field test: Union(Sparse, [22, 5])<0: Binary, 1: Int(64, true)>. error 
> message: can not truncate buffer to a larger size 7: 0
> |at VectorLoader.loadBuffers (VectorLoader.java:83)
> |at VectorLoader.load (VectorLoader.java:62)
> |at ArrowReader$1.visit (ArrowReader.java:125)
> |at ArrowReader$1.visit (ArrowReader.java:111)
> |at ArrowRecordBatch.accepts (ArrowRecordBatch.java:128)
> |at ArrowReader.loadNextBatch (ArrowReader.java:137)
> |at (#7:1)
> {code}
> It seems like Java is not picking up that the UnionArray is Dense instead of 
> Sparse. After changing the default in 
> java/vector/src/main/codegen/templates/UnionVector.java from Sparse to Dense, 
> I get this:
> {code}
> jshell> reader.getVectorSchemaRoot().getSchema()
> $9 ==> Schema [0])<: Int(64, true)>
> {code}
> but then reading doesn't work:
> {code}
> jshell> reader.loadNextBatch()
> |  java.lang.IllegalArgumentException thrown: Could not load buffers for 
> field list: Union(Dense, [1])<: Struct Int(64, true). error message: can not truncate buffer to a larger size 1: > 0
> |at VectorLoader.loadBuffers (VectorLoader.java:83)
> |at VectorLoader.load (VectorLoader.java:62)
> |at ArrowReader$1.visit (ArrowReader.java:125)
> |at ArrowReader$1.visit (ArrowReader.java:111)
> |at ArrowRecordBatch.accepts (ArrowRecordBatch.java:128)
> |at ArrowReader.loadNextBatch (ArrowReader.java:137)
> |at (#8:1)
> {code}
> Any help with this is appreciated!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1692) [Python, Java] UnionArray round trip not working

2018-02-07 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1692:

Fix Version/s: (was: 0.9.0)
   0.10.0

> [Python, Java] UnionArray round trip not working
> 
>
> Key: ARROW-1692
> URL: https://issues.apache.org/jira/browse/ARROW-1692
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Philipp Moritz
>Priority: Major
> Fix For: 0.10.0
>
> Attachments: union_array.arrow
>
>
> I'm currently working on making pyarrow.serialization data available from the 
> Java side, one problem I was running into is that it seems the Java 
> implementation cannot read UnionArrays generated from C++. To make this 
> easily reproducible I created a clean Python implementation for creating 
> UnionArrays: https://github.com/apache/arrow/pull/1216
> The data is generated with the following script:
> {code}
> import pyarrow as pa
> binary = pa.array([b'a', b'b', b'c', b'd'], type='binary')
> int64 = pa.array([1, 2, 3], type='int64')
> types = pa.array([0, 1, 0, 0, 1, 1, 0], type='int8')
> value_offsets = pa.array([0, 0, 2, 1, 1, 2, 3], type='int32')
> result = pa.UnionArray.from_arrays([binary, int64], types, value_offsets)
> batch = pa.RecordBatch.from_arrays([result], ["test"])
> sink = pa.BufferOutputStream()
> writer = pa.RecordBatchStreamWriter(sink, batch.schema)
> writer.write_batch(batch)
> sink.close()
> b = sink.get_result()
> with open("union_array.arrow", "wb") as f:
> f.write(b)
> # Sanity check: Read the batch in again
> with open("union_array.arrow", "rb") as f:
> b = f.read()
> reader = pa.RecordBatchStreamReader(pa.BufferReader(b))
> batch = reader.read_next_batch()
> print("union array is", batch.column(0))
> {code}
> I attached the file generated by that script. Then when I run the following 
> code in Java:
> {code}
> RootAllocator allocator = new RootAllocator(10);
> ByteArrayInputStream in = new 
> ByteArrayInputStream(Files.readAllBytes(Paths.get("union_array.arrow")));
> ArrowStreamReader reader = new ArrowStreamReader(in, allocator);
> reader.loadNextBatch()
> {code}
> I get the following error:
> {code}
> |  java.lang.IllegalArgumentException thrown: Could not load buffers for 
> field test: Union(Sparse, [22, 5])<0: Binary, 1: Int(64, true)>. error 
> message: can not truncate buffer to a larger size 7: 0
> |at VectorLoader.loadBuffers (VectorLoader.java:83)
> |at VectorLoader.load (VectorLoader.java:62)
> |at ArrowReader$1.visit (ArrowReader.java:125)
> |at ArrowReader$1.visit (ArrowReader.java:111)
> |at ArrowRecordBatch.accepts (ArrowRecordBatch.java:128)
> |at ArrowReader.loadNextBatch (ArrowReader.java:137)
> |at (#7:1)
> {code}
> It seems like Java is not picking up that the UnionArray is Dense instead of 
> Sparse. After changing the default in 
> java/vector/src/main/codegen/templates/UnionVector.java from Sparse to Dense, 
> I get this:
> {code}
> jshell> reader.getVectorSchemaRoot().getSchema()
> $9 ==> Schema [0])<: Int(64, true)>
> {code}
> but then reading doesn't work:
> {code}
> jshell> reader.loadNextBatch()
> |  java.lang.IllegalArgumentException thrown: Could not load buffers for 
> field list: Union(Dense, [1])<: Struct Int(64, true). error message: can not truncate buffer to a larger size 1: > 0
> |at VectorLoader.loadBuffers (VectorLoader.java:83)
> |at VectorLoader.load (VectorLoader.java:62)
> |at ArrowReader$1.visit (ArrowReader.java:125)
> |at ArrowReader$1.visit (ArrowReader.java:111)
> |at ArrowRecordBatch.accepts (ArrowRecordBatch.java:128)
> |at ArrowReader.loadNextBatch (ArrowReader.java:137)
> |at (#8:1)
> {code}
> Any help with this is appreciated!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1692) [Python, Java] UnionArray round trip not working

2017-12-07 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1692:

Fix Version/s: 0.9.0

> [Python, Java] UnionArray round trip not working
> 
>
> Key: ARROW-1692
> URL: https://issues.apache.org/jira/browse/ARROW-1692
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Philipp Moritz
> Fix For: 0.9.0
>
> Attachments: union_array.arrow
>
>
> I'm currently working on making pyarrow.serialization data available from the 
> Java side, one problem I was running into is that it seems the Java 
> implementation cannot read UnionArrays generated from C++. To make this 
> easily reproducible I created a clean Python implementation for creating 
> UnionArrays: https://github.com/apache/arrow/pull/1216
> The data is generated with the following script:
> {code}
> import pyarrow as pa
> binary = pa.array([b'a', b'b', b'c', b'd'], type='binary')
> int64 = pa.array([1, 2, 3], type='int64')
> types = pa.array([0, 1, 0, 0, 1, 1, 0], type='int8')
> value_offsets = pa.array([0, 0, 2, 1, 1, 2, 3], type='int32')
> result = pa.UnionArray.from_arrays([binary, int64], types, value_offsets)
> batch = pa.RecordBatch.from_arrays([result], ["test"])
> sink = pa.BufferOutputStream()
> writer = pa.RecordBatchStreamWriter(sink, batch.schema)
> writer.write_batch(batch)
> sink.close()
> b = sink.get_result()
> with open("union_array.arrow", "wb") as f:
> f.write(b)
> # Sanity check: Read the batch in again
> with open("union_array.arrow", "rb") as f:
> b = f.read()
> reader = pa.RecordBatchStreamReader(pa.BufferReader(b))
> batch = reader.read_next_batch()
> print("union array is", batch.column(0))
> {code}
> I attached the file generated by that script. Then when I run the following 
> code in Java:
> {code}
> RootAllocator allocator = new RootAllocator(10);
> ByteArrayInputStream in = new 
> ByteArrayInputStream(Files.readAllBytes(Paths.get("union_array.arrow")));
> ArrowStreamReader reader = new ArrowStreamReader(in, allocator);
> reader.loadNextBatch()
> {code}
> I get the following error:
> {code}
> |  java.lang.IllegalArgumentException thrown: Could not load buffers for 
> field test: Union(Sparse, [22, 5])<0: Binary, 1: Int(64, true)>. error 
> message: can not truncate buffer to a larger size 7: 0
> |at VectorLoader.loadBuffers (VectorLoader.java:83)
> |at VectorLoader.load (VectorLoader.java:62)
> |at ArrowReader$1.visit (ArrowReader.java:125)
> |at ArrowReader$1.visit (ArrowReader.java:111)
> |at ArrowRecordBatch.accepts (ArrowRecordBatch.java:128)
> |at ArrowReader.loadNextBatch (ArrowReader.java:137)
> |at (#7:1)
> {code}
> It seems like Java is not picking up that the UnionArray is Dense instead of 
> Sparse. After changing the default in 
> java/vector/src/main/codegen/templates/UnionVector.java from Sparse to Dense, 
> I get this:
> {code}
> jshell> reader.getVectorSchemaRoot().getSchema()
> $9 ==> Schema [0])<: Int(64, true)>
> {code}
> but then reading doesn't work:
> {code}
> jshell> reader.loadNextBatch()
> |  java.lang.IllegalArgumentException thrown: Could not load buffers for 
> field list: Union(Dense, [1])<: Struct Int(64, true). error message: can not truncate buffer to a larger size 1: > 0
> |at VectorLoader.loadBuffers (VectorLoader.java:83)
> |at VectorLoader.load (VectorLoader.java:62)
> |at ArrowReader$1.visit (ArrowReader.java:125)
> |at ArrowReader$1.visit (ArrowReader.java:111)
> |at ArrowRecordBatch.accepts (ArrowRecordBatch.java:128)
> |at ArrowReader.loadNextBatch (ArrowReader.java:137)
> |at (#8:1)
> {code}
> Any help with this is appreciated!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1692) [Python, Java] UnionArray round trip not working

2017-10-19 Thread Philipp Moritz (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz updated ARROW-1692:
--
Description: 
I'm currently working on making pyarrow.serialization data available from the 
Java side, one problem I was running into is that it seems the Java 
implementation cannot read UnionArrays generated from C++. To make this easily 
reproducible I created a clean Python implementation for creating UnionArrays: 
https://github.com/apache/arrow/pull/1216

The data is generated with the following script:

{code}
import pyarrow as pa

binary = pa.array([b'a', b'b', b'c', b'd'], type='binary')
int64 = pa.array([1, 2, 3], type='int64')
types = pa.array([0, 1, 0, 0, 1, 1, 0], type='int8')
value_offsets = pa.array([0, 0, 2, 1, 1, 2, 3], type='int32')

result = pa.UnionArray.from_arrays([binary, int64], types, value_offsets)

batch = pa.RecordBatch.from_arrays([result], ["test"])

sink = pa.BufferOutputStream()
writer = pa.RecordBatchStreamWriter(sink, batch.schema)

writer.write_batch(batch)

sink.close()

b = sink.get_result()

with open("union_array.arrow", "wb") as f:
f.write(b)

# Sanity check: Read the batch in again

with open("union_array.arrow", "rb") as f:
b = f.read()
reader = pa.RecordBatchStreamReader(pa.BufferReader(b))

batch = reader.read_next_batch()

print("union array is", batch.column(0))
{code}

I attached the file generated by that script. Then when I run the following 
code in Java:

{code}
RootAllocator allocator = new RootAllocator(10);

ByteArrayInputStream in = new 
ByteArrayInputStream(Files.readAllBytes(Paths.get("union_array.arrow")));

ArrowStreamReader reader = new ArrowStreamReader(in, allocator);

reader.loadNextBatch()
{code}

I get the following error:

{code}
|  java.lang.IllegalArgumentException thrown: Could not load buffers for field 
test: Union(Sparse, [22, 5])<0: Binary, 1: Int(64, true)>. error message: can 
not truncate buffer to a larger size 7: 0
|at VectorLoader.loadBuffers (VectorLoader.java:83)
|at VectorLoader.load (VectorLoader.java:62)
|at ArrowReader$1.visit (ArrowReader.java:125)
|at ArrowReader$1.visit (ArrowReader.java:111)
|at ArrowRecordBatch.accepts (ArrowRecordBatch.java:128)
|at ArrowReader.loadNextBatch (ArrowReader.java:137)
|at (#7:1)
{code}

It seems like Java is not picking up that the UnionArray is Dense instead of 
Sparse. After changing the default in 
java/vector/src/main/codegen/templates/UnionVector.java from Sparse to Dense, I 
get this:

{code}
jshell> reader.getVectorSchemaRoot().getSchema()
$9 ==> Schema
{code}

but then reading doesn't work:

{code}
jshell> reader.loadNextBatch()
|  java.lang.IllegalArgumentException thrown: Could not load buffers for field 
list: Union(Dense, [1])<: Struct>>>. error message: can not truncate buffer to a larger size 1: 0
|at VectorLoader.loadBuffers (VectorLoader.java:83)
|at VectorLoader.load (VectorLoader.java:62)
|at ArrowReader$1.visit (ArrowReader.java:125)
|at ArrowReader$1.visit (ArrowReader.java:111)
|at ArrowRecordBatch.accepts (ArrowRecordBatch.java:128)
|at ArrowReader.loadNextBatch (ArrowReader.java:137)
|at (#8:1)
{code}

Any help with this is appreciated!

  was:
I'm currently working on making pyarrow.serialization data available from the 
Java side, one problem I was running into is that it seems the Java 
implementation cannot read UnionArrays generated from C++. To make this easily 
reproducible I created a clean Python implementation for creating UnionArrays: 
https://github.com/apache/arrow/pull/1216

The data is generated with the following script:

```
import pyarrow as pa

binary = pa.array([b'a', b'b', b'c', b'd'], type='binary')
int64 = pa.array([1, 2, 3], type='int64')
types = pa.array([0, 1, 0, 0, 1, 1, 0], type='int8')
value_offsets = pa.array([0, 0, 2, 1, 1, 2, 3], type='int32')

result = pa.UnionArray.from_arrays([binary, int64], types, value_offsets)

batch = pa.RecordBatch.from_arrays([result], ["test"])

sink = pa.BufferOutputStream()
writer = pa.RecordBatchStreamWriter(sink, batch.schema)

writer.write_batch(batch)

sink.close()

b = sink.get_result()

with open("union_array.arrow", "wb") as f:
f.write(b)

# Sanity check: Read the batch in again

with open("union_array.arrow", "rb") as f:
b = f.read()
reader = pa.RecordBatchStreamReader(pa.BufferReader(b))

batch = reader.read_next_batch()

print("union array is", batch.column(0))
```

I attached the file generated by that script. Then when I run the following 
code in Java:

```
RootAllocator allocator = new RootAllocator(10);

ByteArrayInputStream in = new 
ByteArrayInputStream(Files.readAllBytes(Paths.get("union_array.arrow")));

ArrowStreamReader reader = new ArrowStreamReader(in, allocator);

reader.loadNextBatch()
```

I get the following error:

```
|  ja