Re: [VOTE] Release Apache Arrow 0.7.1 - RC1
Hi, In
Re: [VOTE] Release Apache Arrow 0.7.1 - RC1
+1 (non-binding) built and ran unit test for Java, C++ ran integration tests verified signature On Wed, Sep 27, 2017 at 6:45 PM, Bryan Cutlerwrote: > +1 (non-binding) > > built and ran unit test for C++, Python, Java > ran integration tests > > On Wed, Sep 27, 2017 at 7:12 AM, Wes McKinney wrote: > > > +1 (binding) > > > > using dev/release/verify-release-candidate.sh I > > > > * Verified signature, checksum on Linux > > * Ran C++, Python (+ Parquet support), C, Java, JS (node 6.11.3) unit > tests > > > > using dev/release/verify-release-candidate.bat on Windows / Visual > Studio > > 2015 * > > > > * Ran C++ and Python unit tests (+ Parquet support) > > > > @Kou, we might want to add "cold start" instructions for people who > > want to use the release verification script, and who may not have the > > requisite libraries (Ruby and system C libraries) to build the GLib > > bindings and run the unit tests. > > > > On Wed, Sep 27, 2017 at 10:01 AM, Wes McKinney > > wrote: > > > Hello all, > > > > > > I'd like to propose the 2nd release candidate (rc1) of Apache > > > Arrow version 0.7.1. This is a bugfix release from 0.7.0. The only > > > difference between rc1 and rc0 was fixing an issue with the source > > > release build for Windows users. > > > > > > The source release rc1 is hosted at [1]. > > > > > > This release candidate is based on commit > > > 0e21f84c2fc26dba949a03ee7d7ebfade0a65b81 [2] > > > > > > The changelog is located at [3]. > > > > > > Please download, verify checksums and signatures, run the unit > > > tests, and vote on the release. Consider using the release > > > verification scripts in [4]. > > > > > > The vote will be open for at least 72 hours. > > > > > > [ ] +1 Release this as Apache Arrow 0.7.1 > > > [ ] +0 > > > [ ] -1 Do not release this as Apache Arrow 0.7.1 because... > > > > > > Thanks, > > > Wes > > > > > > How to validate a release signature: > > > https://httpd.apache.org/dev/verification.html > > > > > > [1]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0. > > 7.1-rc1/ > > > [2]: https://github.com/apache/arrow/tree/ > 0e21f84c2fc26dba949a03ee7d7ebf > > ade0a65b81 > > > [3]: https://git-wip-us.apache.org/repos/asf?p=arrow.git;a=blob_ > > plain;f=CHANGELOG.md;hb=0e21f84c2fc26dba949a03ee7d7ebfade0a65b81 > > > [4]: https://github.com/apache/arrow/tree/master/dev/release > > >
[jira] [Created] (ARROW-1622) [Plasma] Plasma doesn't compile with XCode 9
Philipp Moritz created ARROW-1622: - Summary: [Plasma] Plasma doesn't compile with XCode 9 Key: ARROW-1622 URL: https://issues.apache.org/jira/browse/ARROW-1622 Project: Apache Arrow Issue Type: Bug Components: Plasma (C++) Reporter: Philipp Moritz Compiling the latest arrow with the following flags: ``` cmake -DARROW_PLASMA=on .. make ``` we get this error: ``` [ 61%] Building CXX object src/plasma/CMakeFiles/plasma_objlib.dir/client.cc.o In file included from /Users/rliaw/Research/riselab/ray/src/thirdparty/arrow/cpp/src/plasma/client.cc:20: In file included from /Users/rliaw/Research/riselab/ray/src/thirdparty/arrow/cpp/src/plasma/client.h:31: In file included from /Users/rliaw/Research/riselab/ray/src/thirdparty/arrow/cpp/src/plasma/common.h:30: In file included from /Users/rliaw/Research/riselab/ray/src/thirdparty/arrow/cpp/src/arrow/util/logging.h:22: In file included from /Library/Developer/CommandLineTools/usr/include/c++/v1/iostream:38: In file included from /Library/Developer/CommandLineTools/usr/include/c++/v1/ios:216: In file included from /Library/Developer/CommandLineTools/usr/include/c++/v1/__locale:18: In file included from /Library/Developer/CommandLineTools/usr/include/c++/v1/mutex:189: In file included from /Library/Developer/CommandLineTools/usr/include/c++/v1/__mutex_base:17: /Library/Developer/CommandLineTools/usr/include/c++/v1/__threading_support:156:1: error: unknown type name 'mach_port_t' mach_port_t __libcpp_thread_get_port(); ^ /Library/Developer/CommandLineTools/usr/include/c++/v1/__threading_support:300:1: error: unknown type name 'mach_port_t' mach_port_t __libcpp_thread_get_port() { ^ /Library/Developer/CommandLineTools/usr/include/c++/v1/__threading_support:301:12: error: use of undeclared identifier 'pthread_mach_thread_np' return pthread_mach_thread_np(pthread_self()); ^ 3 errors generated. make[2]: *** [src/plasma/CMakeFiles/plasma_objlib.dir/client.cc.o] Error 1 make[1]: *** [src/plasma/CMakeFiles/plasma_objlib.dir/all] Error 2 make: *** [all] Error 2 ``` The problem was discovered and diagnosed in https://github.com/apache/arrow/pull/1139 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Re: [VOTE] Release Apache Arrow 0.7.1 - RC1
+1 (non-binding) built and ran unit test for C++, Python, Java ran integration tests On Wed, Sep 27, 2017 at 7:12 AM, Wes McKinneywrote: > +1 (binding) > > using dev/release/verify-release-candidate.sh I > > * Verified signature, checksum on Linux > * Ran C++, Python (+ Parquet support), C, Java, JS (node 6.11.3) unit tests > > using dev/release/verify-release-candidate.bat on Windows / Visual Studio > 2015 * > > * Ran C++ and Python unit tests (+ Parquet support) > > @Kou, we might want to add "cold start" instructions for people who > want to use the release verification script, and who may not have the > requisite libraries (Ruby and system C libraries) to build the GLib > bindings and run the unit tests. > > On Wed, Sep 27, 2017 at 10:01 AM, Wes McKinney > wrote: > > Hello all, > > > > I'd like to propose the 2nd release candidate (rc1) of Apache > > Arrow version 0.7.1. This is a bugfix release from 0.7.0. The only > > difference between rc1 and rc0 was fixing an issue with the source > > release build for Windows users. > > > > The source release rc1 is hosted at [1]. > > > > This release candidate is based on commit > > 0e21f84c2fc26dba949a03ee7d7ebfade0a65b81 [2] > > > > The changelog is located at [3]. > > > > Please download, verify checksums and signatures, run the unit > > tests, and vote on the release. Consider using the release > > verification scripts in [4]. > > > > The vote will be open for at least 72 hours. > > > > [ ] +1 Release this as Apache Arrow 0.7.1 > > [ ] +0 > > [ ] -1 Do not release this as Apache Arrow 0.7.1 because... > > > > Thanks, > > Wes > > > > How to validate a release signature: > > https://httpd.apache.org/dev/verification.html > > > > [1]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0. > 7.1-rc1/ > > [2]: https://github.com/apache/arrow/tree/0e21f84c2fc26dba949a03ee7d7ebf > ade0a65b81 > > [3]: https://git-wip-us.apache.org/repos/asf?p=arrow.git;a=blob_ > plain;f=CHANGELOG.md;hb=0e21f84c2fc26dba949a03ee7d7ebfade0a65b81 > > [4]: https://github.com/apache/arrow/tree/master/dev/release >
[jira] [Created] (ARROW-1621) Reduce Heap Usage per Vector
Siddharth Teotia created ARROW-1621: --- Summary: Reduce Heap Usage per Vector Key: ARROW-1621 URL: https://issues.apache.org/jira/browse/ARROW-1621 Project: Apache Arrow Issue Type: Improvement Components: Java - Memory, Java - Vectors Reporter: Siddharth Teotia https://docs.google.com/document/d/1MU-ah_bBHIxXNrd7SkwewGCOOexkXJ7cgKaCis5f-PI/edit -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Re: ArrowFileReader failing to read bytes written to Java output stream
Hi Andrew, I do not see the attached code, maybe the attachments got stripped? Is it small enough to just inline in the message? Bryan On Wed, Sep 27, 2017 at 12:26 PM, Andrew Pham (BLOOMBERG/ 731 LEX) < apha...@bloomberg.net> wrote: > Also for reference, this is apparently Arrow Schema used by the > ArrowFileWriter to write to the output stream (given by > root.getSchema().toString() and root.getSchema().toJson()): > > Schema > { > "fields" : [ { > "name" : "price", > "nullable" : true, > "type" : { > "name" : "floatingpoint", > "precision" : "DOUBLE" > }, > "children" : [ ], > "typeLayout" : { > "vectors" : [ { > "type" : "VALIDITY", > "typeBitWidth" : 1 > }, { > "type" : "DATA", > "typeBitWidth" : 64 > } ] > } > }, { > "name" : "numShares", > "nullable" : true, > "type" : { > "name" : "int", > "bitWidth" : 32, > "isSigned" : true > }, > "children" : [ ], > "typeLayout" : { > "vectors" : [ { > "type" : "VALIDITY", > "typeBitWidth" : 1 > }, { > "type" : "DATA", > "typeBitWidth" : 32 > } ] > } > } ] > } > > > Given our bytes (wrapped by a SeekableByteChannel), the reader is unable > to obtain the schema from this. Any ideas as to what could be happening? > Cheers! > > From: dev@arrow.apache.org At: 09/26/17 18:59:18To: Andrew Pham > (BLOOMBERG/ 731 LEX ) , dev@arrow.apache.org > Subject: Re: ArrowFileReader failing to read bytes written to Java output > stream > > Andrew, > > Seems like it fails to read the schema. It has reached the data part yet. > Can you share your reader/writer code? > > On Tue, Sep 26, 2017 at 6:37 PM, Andrew Pham (BLOOMBERG/ 731 LEX) < > apha...@bloomberg.net> wrote: > > > Hello there, I've written something that behaves similarly to: > > > > https://github.com/apache/spark/blob/master/sql/core/ > > src/main/scala/org/apache/spark/sql/execution/arrow/ > > ArrowConverters.scala#L73 > > > > Except that for proof of concept purposes, it transforms Java objects > with > > data into a byte[] payload. The ArrowFileWriter log statements indicate > > that data is getting written to the output stream: > > > > 17:53:16.759 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > > Writing buffer with size: 6 > > 17:53:16.759 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > > Writing buffer with size: 2 > > 17:53:16.766 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > > Writing buffer with size: 4 > > 17:53:16.766 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > > Writing buffer with size: 288 > > 17:53:16.766 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > > Writing buffer with size: 4 > > 17:53:16.769 [main] DEBUG org.apache.arrow.vector. > schema.ArrowRecordBatch > > - Buffer in RecordBatch at 0, length: 1 > > 17:53:16.769 [main] DEBUG org.apache.arrow.vector. > schema.ArrowRecordBatch > > - Buffer in RecordBatch at 8, length: 24 > > 17:53:16.770 [main] DEBUG org.apache.arrow.vector. > schema.ArrowRecordBatch > > - Buffer in RecordBatch at 32, length: 1 > > 17:53:16.770 [main] DEBUG org.apache.arrow.vector. > schema.ArrowRecordBatch > > - Buffer in RecordBatch at 40, length: 12 > > 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > > Writing buffer with size: 4 > > 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > > Writing buffer with size: 216 > > 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > > Writing buffer with size: 4 > > 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > > Writing buffer with size: 1 > > 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > > Writing buffer with size: 7 > > 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > > Writing buffer with size: 24 > > 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > > Writing buffer with size: 1 > > 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > > Writing buffer with size: 7 > > 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > > Writing buffer with size: 12 > > 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > > Writing buffer with size: 4 > > 17:53:16.772 [main] DEBUG org.apache.arrow.vector.file.ArrowWriter - > > RecordBatch at 304, metadata: 224, body: 56 > > > > > > However, when I wrap that payload into a ByteArrayReadableSeekableByteC > hannel > > and use ArrowFileReader (along with a BufferAllocator) to read it, > > ArrowFileReader is complaining that it's reading an invalid format, right > > at the point where I call reader.getVectorSchemaRoot(): > > > > Exception in thread "main" org.apache.arrow.vector.file. > InvalidArrowFileException: > > missing Magic number [0, 0, 42, 0, 0, 0, 0, 0, 0, 0] > > at
[jira] [Created] (ARROW-1620) Python: Download Boost in manylinux1 build from bintray
Uwe L. Korn created ARROW-1620: -- Summary: Python: Download Boost in manylinux1 build from bintray Key: ARROW-1620 URL: https://issues.apache.org/jira/browse/ARROW-1620 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: Uwe L. Korn Fix For: 0.8.0 Sourceforge often fails, so use the alternative source. See also https://github.com/ray-project/ray/pull/1019 for this. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Re: ArrowFileReader failing to read bytes written to Java output stream
Also for reference, this is apparently Arrow Schema used by the ArrowFileWriter to write to the output stream (given by root.getSchema().toString() and root.getSchema().toJson()): Schema { "fields" : [ { "name" : "price", "nullable" : true, "type" : { "name" : "floatingpoint", "precision" : "DOUBLE" }, "children" : [ ], "typeLayout" : { "vectors" : [ { "type" : "VALIDITY", "typeBitWidth" : 1 }, { "type" : "DATA", "typeBitWidth" : 64 } ] } }, { "name" : "numShares", "nullable" : true, "type" : { "name" : "int", "bitWidth" : 32, "isSigned" : true }, "children" : [ ], "typeLayout" : { "vectors" : [ { "type" : "VALIDITY", "typeBitWidth" : 1 }, { "type" : "DATA", "typeBitWidth" : 32 } ] } } ] } Given our bytes (wrapped by a SeekableByteChannel), the reader is unable to obtain the schema from this. Any ideas as to what could be happening? Cheers! From: dev@arrow.apache.org At: 09/26/17 18:59:18To: Andrew Pham (BLOOMBERG/ 731 LEX ) , dev@arrow.apache.org Subject: Re: ArrowFileReader failing to read bytes written to Java output stream Andrew, Seems like it fails to read the schema. It has reached the data part yet. Can you share your reader/writer code? On Tue, Sep 26, 2017 at 6:37 PM, Andrew Pham (BLOOMBERG/ 731 LEX) < apha...@bloomberg.net> wrote: > Hello there, I've written something that behaves similarly to: > > https://github.com/apache/spark/blob/master/sql/core/ > src/main/scala/org/apache/spark/sql/execution/arrow/ > ArrowConverters.scala#L73 > > Except that for proof of concept purposes, it transforms Java objects with > data into a byte[] payload. The ArrowFileWriter log statements indicate > that data is getting written to the output stream: > > 17:53:16.759 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > Writing buffer with size: 6 > 17:53:16.759 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > Writing buffer with size: 2 > 17:53:16.766 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > Writing buffer with size: 4 > 17:53:16.766 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > Writing buffer with size: 288 > 17:53:16.766 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > Writing buffer with size: 4 > 17:53:16.769 [main] DEBUG org.apache.arrow.vector.schema.ArrowRecordBatch > - Buffer in RecordBatch at 0, length: 1 > 17:53:16.769 [main] DEBUG org.apache.arrow.vector.schema.ArrowRecordBatch > - Buffer in RecordBatch at 8, length: 24 > 17:53:16.770 [main] DEBUG org.apache.arrow.vector.schema.ArrowRecordBatch > - Buffer in RecordBatch at 32, length: 1 > 17:53:16.770 [main] DEBUG org.apache.arrow.vector.schema.ArrowRecordBatch > - Buffer in RecordBatch at 40, length: 12 > 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > Writing buffer with size: 4 > 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > Writing buffer with size: 216 > 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > Writing buffer with size: 4 > 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > Writing buffer with size: 1 > 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > Writing buffer with size: 7 > 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > Writing buffer with size: 24 > 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > Writing buffer with size: 1 > 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > Writing buffer with size: 7 > 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > Writing buffer with size: 12 > 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > Writing buffer with size: 4 > 17:53:16.772 [main] DEBUG org.apache.arrow.vector.file.ArrowWriter - > RecordBatch at 304, metadata: 224, body: 56 > > > However, when I wrap that payload into a ByteArrayReadableSeekableByteChannel > and use ArrowFileReader (along with a BufferAllocator) to read it, > ArrowFileReader is complaining that it's reading an invalid format, right > at the point where I call reader.getVectorSchemaRoot(): > > Exception in thread "main" > org.apache.arrow.vector.file.InvalidArrowFileException: > missing Magic number [0, 0, 42, 0, 0, 0, 0, 0, 0, 0] > at org.apache.arrow.vector.file.ArrowFileReader.readSchema( > ArrowFileReader.java:66) > at org.apache.arrow.vector.file.ArrowFileReader.readSchema( > ArrowFileReader.java:37) > at org.apache.arrow.vector.file.ArrowReader.initialize( > ArrowReader.java:162) > at org.apache.arrow.vector.file.ArrowReader.ensureInitialized( > ArrowReader.java:153) > at org.apache.arrow.vector.file.ArrowReader.getVectorSchemaRoot( > ArrowReader.java:67) > at com.bloomberg.andrew.sql.execution.arrow.ArrowConverters. >
[jira] [Created] (ARROW-1619) [Java] Correctly set "lastSet" for variable vectors in JsonReader
Bryan Cutler created ARROW-1619: --- Summary: [Java] Correctly set "lastSet" for variable vectors in JsonReader Key: ARROW-1619 URL: https://issues.apache.org/jira/browse/ARROW-1619 Project: Apache Arrow Issue Type: Bug Components: Java - Vectors Reporter: Bryan Cutler Assignee: Bryan Cutler The Arrow Java JsonFileReader does not correctly set "lastSet" in VariableWidthVectors which makes reading inner vectors overly complicated. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1618) See if the heap usage in vectors can be reduced.
Siddharth Teotia created ARROW-1618: --- Summary: See if the heap usage in vectors can be reduced. Key: ARROW-1618 URL: https://issues.apache.org/jira/browse/ARROW-1618 Project: Apache Arrow Issue Type: Improvement Components: Java - Memory, Java - Vectors Reporter: Siddharth Teotia Assignee: Siddharth Teotia We have seen in our tests that there is some scope of improvement as far as the number of objects and/or sizing of some data structures is concerned. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Re: [VOTE] Release Apache Arrow 0.7.1 - RC1
+1 (binding) using dev/release/verify-release-candidate.sh I * Verified signature, checksum on Linux * Ran C++, Python (+ Parquet support), C, Java, JS (node 6.11.3) unit tests using dev/release/verify-release-candidate.bat on Windows / Visual Studio 2015 * * Ran C++ and Python unit tests (+ Parquet support) @Kou, we might want to add "cold start" instructions for people who want to use the release verification script, and who may not have the requisite libraries (Ruby and system C libraries) to build the GLib bindings and run the unit tests. On Wed, Sep 27, 2017 at 10:01 AM, Wes McKinneywrote: > Hello all, > > I'd like to propose the 2nd release candidate (rc1) of Apache > Arrow version 0.7.1. This is a bugfix release from 0.7.0. The only > difference between rc1 and rc0 was fixing an issue with the source > release build for Windows users. > > The source release rc1 is hosted at [1]. > > This release candidate is based on commit > 0e21f84c2fc26dba949a03ee7d7ebfade0a65b81 [2] > > The changelog is located at [3]. > > Please download, verify checksums and signatures, run the unit > tests, and vote on the release. Consider using the release > verification scripts in [4]. > > The vote will be open for at least 72 hours. > > [ ] +1 Release this as Apache Arrow 0.7.1 > [ ] +0 > [ ] -1 Do not release this as Apache Arrow 0.7.1 because... > > Thanks, > Wes > > How to validate a release signature: > https://httpd.apache.org/dev/verification.html > > [1]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.7.1-rc1/ > [2]: > https://github.com/apache/arrow/tree/0e21f84c2fc26dba949a03ee7d7ebfade0a65b81 > [3]: > https://git-wip-us.apache.org/repos/asf?p=arrow.git;a=blob_plain;f=CHANGELOG.md;hb=0e21f84c2fc26dba949a03ee7d7ebfade0a65b81 > [4]: https://github.com/apache/arrow/tree/master/dev/release
[VOTE] Release Apache Arrow 0.7.1 - RC1
Hello all, I'd like to propose the 2nd release candidate (rc1) of Apache Arrow version 0.7.1. This is a bugfix release from 0.7.0. The only difference between rc1 and rc0 was fixing an issue with the source release build for Windows users. The source release rc1 is hosted at [1]. This release candidate is based on commit 0e21f84c2fc26dba949a03ee7d7ebfade0a65b81 [2] The changelog is located at [3]. Please download, verify checksums and signatures, run the unit tests, and vote on the release. Consider using the release verification scripts in [4]. The vote will be open for at least 72 hours. [ ] +1 Release this as Apache Arrow 0.7.1 [ ] +0 [ ] -1 Do not release this as Apache Arrow 0.7.1 because... Thanks, Wes How to validate a release signature: https://httpd.apache.org/dev/verification.html [1]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.7.1-rc1/ [2]: https://github.com/apache/arrow/tree/0e21f84c2fc26dba949a03ee7d7ebfade0a65b81 [3]: https://git-wip-us.apache.org/repos/asf?p=arrow.git;a=blob_plain;f=CHANGELOG.md;hb=0e21f84c2fc26dba949a03ee7d7ebfade0a65b81 [4]: https://github.com/apache/arrow/tree/master/dev/release
Re: ArrowFileReader failing to read bytes written to Java output stream
Thanks for taking a look! I've attached snippets of the code. I also suspect it has something to do with the way I'm defining the schema. I should note especially that the helper classes (the ArrowWriter and ArrowUtils) currently do not support transformations for array/struct data types; only basic object types that have primitives. For now, I'm trying to serialize/send over a list of objects that consist of an integer and a double field/member. Please let me know if there's anything else you'd like to see. The current way I'm trying to construct arrow schema objects is by attempting to transform arbitrary Class types via reflection. At the very least, the writer seems to be writing stuff. Any insights into this problem would be helpful, thank you From: dev@arrow.apache.org At: 09/26/17 18:59:18To: Andrew Pham (BLOOMBERG/ 731 LEX ) , dev@arrow.apache.org Subject: Re: ArrowFileReader failing to read bytes written to Java output stream Andrew, Seems like it fails to read the schema. It has reached the data part yet. Can you share your reader/writer code? On Tue, Sep 26, 2017 at 6:37 PM, Andrew Pham (BLOOMBERG/ 731 LEX) < apha...@bloomberg.net> wrote: > Hello there, I've written something that behaves similarly to: > > https://github.com/apache/spark/blob/master/sql/core/ > src/main/scala/org/apache/spark/sql/execution/arrow/ > ArrowConverters.scala#L73 > > Except that for proof of concept purposes, it transforms Java objects with > data into a byte[] payload. The ArrowFileWriter log statements indicate > that data is getting written to the output stream: > > 17:53:16.759 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > Writing buffer with size: 6 > 17:53:16.759 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > Writing buffer with size: 2 > 17:53:16.766 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > Writing buffer with size: 4 > 17:53:16.766 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > Writing buffer with size: 288 > 17:53:16.766 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > Writing buffer with size: 4 > 17:53:16.769 [main] DEBUG org.apache.arrow.vector.schema.ArrowRecordBatch > - Buffer in RecordBatch at 0, length: 1 > 17:53:16.769 [main] DEBUG org.apache.arrow.vector.schema.ArrowRecordBatch > - Buffer in RecordBatch at 8, length: 24 > 17:53:16.770 [main] DEBUG org.apache.arrow.vector.schema.ArrowRecordBatch > - Buffer in RecordBatch at 32, length: 1 > 17:53:16.770 [main] DEBUG org.apache.arrow.vector.schema.ArrowRecordBatch > - Buffer in RecordBatch at 40, length: 12 > 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > Writing buffer with size: 4 > 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > Writing buffer with size: 216 > 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > Writing buffer with size: 4 > 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > Writing buffer with size: 1 > 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > Writing buffer with size: 7 > 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > Writing buffer with size: 24 > 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > Writing buffer with size: 1 > 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > Writing buffer with size: 7 > 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > Writing buffer with size: 12 > 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - > Writing buffer with size: 4 > 17:53:16.772 [main] DEBUG org.apache.arrow.vector.file.ArrowWriter - > RecordBatch at 304, metadata: 224, body: 56 > > > However, when I wrap that payload into a ByteArrayReadableSeekableByteChannel > and use ArrowFileReader (along with a BufferAllocator) to read it, > ArrowFileReader is complaining that it's reading an invalid format, right > at the point where I call reader.getVectorSchemaRoot(): > > Exception in thread "main" > org.apache.arrow.vector.file.InvalidArrowFileException: > missing Magic number [0, 0, 42, 0, 0, 0, 0, 0, 0, 0] > at org.apache.arrow.vector.file.ArrowFileReader.readSchema( > ArrowFileReader.java:66) > at org.apache.arrow.vector.file.ArrowFileReader.readSchema( > ArrowFileReader.java:37) > at org.apache.arrow.vector.file.ArrowReader.initialize( > ArrowReader.java:162) > at org.apache.arrow.vector.file.ArrowReader.ensureInitialized( > ArrowReader.java:153) > at org.apache.arrow.vector.file.ArrowReader.getVectorSchemaRoot( > ArrowReader.java:67) > at com.bloomberg.andrew.sql.execution.arrow.ArrowConverters. > byteArrayToBatch(ArrowConverters.java:89) > at com.bloomberg.andrew.sql.execution.arrow.ArrowPayload. > loadBatch(ArrowPayload.java:18) > at com.bloomberg.andrew.test.arrow.ArrowPublisher.main( > ArrowPublisher.java:28) > > > I'm noticing that the number 42 is exactly the same as the value of the
Re: [VOTE] Release Apache Arrow 0.7.1 - RC0
-1 (binding) I verified the release on Linux (Ubuntu 14.04) with dev/release/verify-release-candidate.sh 0.7.1 0 and it passed perfectly (verifying C++, Python, C / GLib, Java, JS). I suggest others try the script out for this release Unfortunately, it turns out that for the source release script to work correctly on Windows, and the setting git config core.symlinks true must be set or symlinked files do not get replaced with their contents. This is brittleness that we should fix (by consolidating all CMake modules in a single directory). \ I will send out a new RC (after I test it on Windows, of course). see https://issues.apache.org/jira/browse/ARROW-1617 - Wes On Wed, Sep 27, 2017 at 12:28 AM, Wes McKinneywrote: > Hello all, > > I'd like to propose the 1st release candidate (rc0) of Apache > Arrow version 0.7.1. This is a bugfix release from 0.7.0. > > The source release rc0 is hosted at [1]. > > This release candidate is based on commit > 6354053e39dbed5b5317a5f4070f366833e9544d [2] > > The changelog is located at [3]. > > Please download, verify checksums and signatures, run the unit > tests, and vote on the release. Consider using the release > verification scripts in [4]. > > The vote will be open for at least 72 hours. > > [ ] +1 Release this as Apache Arrow 0.7.1 > [ ] +0 > [ ] -1 Do not release this as Apache Arrow 0.7.1 because... > > Thanks, > Wes > > How to validate a release signature: > https://httpd.apache.org/dev/verification.html > > [1]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.7.1-rc0/ > [2]: > https://github.com/apache/arrow/tree/6354053e39dbed5b5317a5f4070f366833e9544d > [3]: > https://git-wip-us.apache.org/repos/asf?p=arrow.git;a=blob_plain;f=CHANGELOG.md;hb=6354053e39dbed5b5317a5f4070f366833e9544d > [4]: https://github.com/apache/arrow/tree/master/dev/release
[jira] [Created] (ARROW-1617) [Python] Do not use symlinks in python/cmake_modules
Wes McKinney created ARROW-1617: --- Summary: [Python] Do not use symlinks in python/cmake_modules Key: ARROW-1617 URL: https://issues.apache.org/jira/browse/ARROW-1617 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: Wes McKinney Fix For: 0.8.0 This requires that {{git config core.symlinks true}} be set, which makes development and source releases (on Linux/macOS even) more brittle -- This message was sent by Atlassian JIRA (v6.4.14#64029)