Re: [VOTE] Release Apache Arrow 0.7.1 - RC1

2017-09-27 Thread Kouhei Sutou
Hi,

In 

Re: [VOTE] Release Apache Arrow 0.7.1 - RC1

2017-09-27 Thread Li Jin
+1 (non-binding)

built and ran unit test for Java, C++
ran integration tests
verified signature

On Wed, Sep 27, 2017 at 6:45 PM, Bryan Cutler  wrote:

> +1 (non-binding)
>
> built and ran unit test for C++, Python, Java
> ran integration tests
>
> On Wed, Sep 27, 2017 at 7:12 AM, Wes McKinney  wrote:
>
> > +1 (binding)
> >
> > using dev/release/verify-release-candidate.sh I
> >
> > * Verified signature, checksum on Linux
> > * Ran C++, Python (+ Parquet support), C, Java, JS (node 6.11.3) unit
> tests
> >
> > using dev/release/verify-release-candidate.bat on Windows / Visual
> Studio
> > 2015 *
> >
> > * Ran C++ and Python unit tests (+ Parquet support)
> >
> > @Kou, we might want to add "cold start" instructions for people who
> > want to use the release verification script, and who may not have the
> > requisite libraries (Ruby and system C libraries) to build the GLib
> > bindings and run the unit tests.
> >
> > On Wed, Sep 27, 2017 at 10:01 AM, Wes McKinney 
> > wrote:
> > > Hello all,
> > >
> > > I'd like to propose the 2nd release candidate (rc1) of Apache
> > > Arrow version 0.7.1.  This is a bugfix release from 0.7.0. The only
> > > difference between rc1 and rc0 was fixing an issue with the source
> > > release build for Windows users.
> > >
> > > The source release rc1 is hosted at [1].
> > >
> > > This release candidate is based on commit
> > > 0e21f84c2fc26dba949a03ee7d7ebfade0a65b81 [2]
> > >
> > > The changelog is located at [3].
> > >
> > > Please download, verify checksums and signatures, run the unit
> > > tests, and vote on the release. Consider using the release
> > > verification scripts in [4].
> > >
> > > The vote will be open for at least 72 hours.
> > >
> > > [ ] +1 Release this as Apache Arrow 0.7.1
> > > [ ] +0
> > > [ ] -1 Do not release this as Apache Arrow 0.7.1 because...
> > >
> > > Thanks,
> > > Wes
> > >
> > > How to validate a release signature:
> > > https://httpd.apache.org/dev/verification.html
> > >
> > > [1]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.
> > 7.1-rc1/
> > > [2]: https://github.com/apache/arrow/tree/
> 0e21f84c2fc26dba949a03ee7d7ebf
> > ade0a65b81
> > > [3]: https://git-wip-us.apache.org/repos/asf?p=arrow.git;a=blob_
> > plain;f=CHANGELOG.md;hb=0e21f84c2fc26dba949a03ee7d7ebfade0a65b81
> > > [4]: https://github.com/apache/arrow/tree/master/dev/release
> >
>


[jira] [Created] (ARROW-1622) [Plasma] Plasma doesn't compile with XCode 9

2017-09-27 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-1622:
-

 Summary: [Plasma] Plasma doesn't compile with XCode 9
 Key: ARROW-1622
 URL: https://issues.apache.org/jira/browse/ARROW-1622
 Project: Apache Arrow
  Issue Type: Bug
  Components: Plasma (C++)
Reporter: Philipp Moritz


Compiling the latest arrow with the following flags:

```
cmake -DARROW_PLASMA=on ..
make
```
we get this error:

```
[ 61%] Building CXX object src/plasma/CMakeFiles/plasma_objlib.dir/client.cc.o
In file included from 
/Users/rliaw/Research/riselab/ray/src/thirdparty/arrow/cpp/src/plasma/client.cc:20:
In file included from 
/Users/rliaw/Research/riselab/ray/src/thirdparty/arrow/cpp/src/plasma/client.h:31:
In file included from 
/Users/rliaw/Research/riselab/ray/src/thirdparty/arrow/cpp/src/plasma/common.h:30:
In file included from 
/Users/rliaw/Research/riselab/ray/src/thirdparty/arrow/cpp/src/arrow/util/logging.h:22:
In file included from 
/Library/Developer/CommandLineTools/usr/include/c++/v1/iostream:38:
In file included from 
/Library/Developer/CommandLineTools/usr/include/c++/v1/ios:216:
In file included from 
/Library/Developer/CommandLineTools/usr/include/c++/v1/__locale:18:
In file included from 
/Library/Developer/CommandLineTools/usr/include/c++/v1/mutex:189:
In file included from 
/Library/Developer/CommandLineTools/usr/include/c++/v1/__mutex_base:17:
/Library/Developer/CommandLineTools/usr/include/c++/v1/__threading_support:156:1:
 error: unknown type
 name 'mach_port_t'
mach_port_t __libcpp_thread_get_port();
^
/Library/Developer/CommandLineTools/usr/include/c++/v1/__threading_support:300:1:
 error: unknown type
 name 'mach_port_t'
mach_port_t __libcpp_thread_get_port() {
^
/Library/Developer/CommandLineTools/usr/include/c++/v1/__threading_support:301:12:
 error: use of
 undeclared identifier 'pthread_mach_thread_np'
   return pthread_mach_thread_np(pthread_self());
  ^
3 errors generated.
make[2]: *** [src/plasma/CMakeFiles/plasma_objlib.dir/client.cc.o] Error 1
make[1]: *** [src/plasma/CMakeFiles/plasma_objlib.dir/all] Error 2
make: *** [all] Error 2
```

The problem was discovered and diagnosed in 
https://github.com/apache/arrow/pull/1139



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: [VOTE] Release Apache Arrow 0.7.1 - RC1

2017-09-27 Thread Bryan Cutler
+1 (non-binding)

built and ran unit test for C++, Python, Java
ran integration tests

On Wed, Sep 27, 2017 at 7:12 AM, Wes McKinney  wrote:

> +1 (binding)
>
> using dev/release/verify-release-candidate.sh I
>
> * Verified signature, checksum on Linux
> * Ran C++, Python (+ Parquet support), C, Java, JS (node 6.11.3) unit tests
>
> using dev/release/verify-release-candidate.bat on Windows / Visual Studio
> 2015 *
>
> * Ran C++ and Python unit tests (+ Parquet support)
>
> @Kou, we might want to add "cold start" instructions for people who
> want to use the release verification script, and who may not have the
> requisite libraries (Ruby and system C libraries) to build the GLib
> bindings and run the unit tests.
>
> On Wed, Sep 27, 2017 at 10:01 AM, Wes McKinney 
> wrote:
> > Hello all,
> >
> > I'd like to propose the 2nd release candidate (rc1) of Apache
> > Arrow version 0.7.1.  This is a bugfix release from 0.7.0. The only
> > difference between rc1 and rc0 was fixing an issue with the source
> > release build for Windows users.
> >
> > The source release rc1 is hosted at [1].
> >
> > This release candidate is based on commit
> > 0e21f84c2fc26dba949a03ee7d7ebfade0a65b81 [2]
> >
> > The changelog is located at [3].
> >
> > Please download, verify checksums and signatures, run the unit
> > tests, and vote on the release. Consider using the release
> > verification scripts in [4].
> >
> > The vote will be open for at least 72 hours.
> >
> > [ ] +1 Release this as Apache Arrow 0.7.1
> > [ ] +0
> > [ ] -1 Do not release this as Apache Arrow 0.7.1 because...
> >
> > Thanks,
> > Wes
> >
> > How to validate a release signature:
> > https://httpd.apache.org/dev/verification.html
> >
> > [1]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.
> 7.1-rc1/
> > [2]: https://github.com/apache/arrow/tree/0e21f84c2fc26dba949a03ee7d7ebf
> ade0a65b81
> > [3]: https://git-wip-us.apache.org/repos/asf?p=arrow.git;a=blob_
> plain;f=CHANGELOG.md;hb=0e21f84c2fc26dba949a03ee7d7ebfade0a65b81
> > [4]: https://github.com/apache/arrow/tree/master/dev/release
>


[jira] [Created] (ARROW-1621) Reduce Heap Usage per Vector

2017-09-27 Thread Siddharth Teotia (JIRA)
Siddharth Teotia created ARROW-1621:
---

 Summary: Reduce Heap Usage per Vector
 Key: ARROW-1621
 URL: https://issues.apache.org/jira/browse/ARROW-1621
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java - Memory, Java - Vectors
Reporter: Siddharth Teotia


https://docs.google.com/document/d/1MU-ah_bBHIxXNrd7SkwewGCOOexkXJ7cgKaCis5f-PI/edit



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: ArrowFileReader failing to read bytes written to Java output stream

2017-09-27 Thread Bryan Cutler
Hi Andrew,

I do not see the attached code, maybe the attachments got stripped?  Is it
small enough to just inline in the message?

Bryan

On Wed, Sep 27, 2017 at 12:26 PM, Andrew Pham (BLOOMBERG/ 731 LEX) <
apha...@bloomberg.net> wrote:

> Also for reference, this is apparently Arrow Schema used by the
> ArrowFileWriter to write to the output stream (given by
> root.getSchema().toString() and root.getSchema().toJson()):
>
> Schema
> {
>   "fields" : [ {
> "name" : "price",
> "nullable" : true,
> "type" : {
>   "name" : "floatingpoint",
>   "precision" : "DOUBLE"
> },
> "children" : [ ],
> "typeLayout" : {
>   "vectors" : [ {
> "type" : "VALIDITY",
> "typeBitWidth" : 1
>   }, {
> "type" : "DATA",
> "typeBitWidth" : 64
>   } ]
> }
>   }, {
> "name" : "numShares",
> "nullable" : true,
> "type" : {
>   "name" : "int",
>   "bitWidth" : 32,
>   "isSigned" : true
> },
> "children" : [ ],
> "typeLayout" : {
>   "vectors" : [ {
> "type" : "VALIDITY",
> "typeBitWidth" : 1
>   }, {
> "type" : "DATA",
> "typeBitWidth" : 32
>   } ]
> }
>   } ]
> }
>
>
> Given our bytes (wrapped by a SeekableByteChannel), the reader is unable
> to obtain the schema from this.  Any ideas as to what could be happening?
> Cheers!
>
> From: dev@arrow.apache.org At: 09/26/17 18:59:18To:  Andrew Pham
> (BLOOMBERG/ 731 LEX ) ,  dev@arrow.apache.org
> Subject: Re: ArrowFileReader failing to read bytes written to Java output
> stream
>
> Andrew,
>
> Seems like it fails to read the schema. It has reached the data part yet.
> Can you share your reader/writer code?
>
> On Tue, Sep 26, 2017 at 6:37 PM, Andrew Pham (BLOOMBERG/ 731 LEX) <
> apha...@bloomberg.net> wrote:
>
> > Hello there, I've written something that behaves similarly to:
> >
> > https://github.com/apache/spark/blob/master/sql/core/
> > src/main/scala/org/apache/spark/sql/execution/arrow/
> > ArrowConverters.scala#L73
> >
> > Except that for proof of concept purposes, it transforms Java objects
> with
> > data into a byte[] payload.  The ArrowFileWriter log statements indicate
> > that data is getting written to the output stream:
> >
> > 17:53:16.759 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> > Writing buffer with size: 6
> > 17:53:16.759 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> > Writing buffer with size: 2
> > 17:53:16.766 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> > Writing buffer with size: 4
> > 17:53:16.766 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> > Writing buffer with size: 288
> > 17:53:16.766 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> > Writing buffer with size: 4
> > 17:53:16.769 [main] DEBUG org.apache.arrow.vector.
> schema.ArrowRecordBatch
> > - Buffer in RecordBatch at 0, length: 1
> > 17:53:16.769 [main] DEBUG org.apache.arrow.vector.
> schema.ArrowRecordBatch
> > - Buffer in RecordBatch at 8, length: 24
> > 17:53:16.770 [main] DEBUG org.apache.arrow.vector.
> schema.ArrowRecordBatch
> > - Buffer in RecordBatch at 32, length: 1
> > 17:53:16.770 [main] DEBUG org.apache.arrow.vector.
> schema.ArrowRecordBatch
> > - Buffer in RecordBatch at 40, length: 12
> > 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> > Writing buffer with size: 4
> > 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> > Writing buffer with size: 216
> > 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> > Writing buffer with size: 4
> > 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> > Writing buffer with size: 1
> > 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> > Writing buffer with size: 7
> > 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> > Writing buffer with size: 24
> > 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> > Writing buffer with size: 1
> > 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> > Writing buffer with size: 7
> > 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> > Writing buffer with size: 12
> > 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> > Writing buffer with size: 4
> > 17:53:16.772 [main] DEBUG org.apache.arrow.vector.file.ArrowWriter -
> > RecordBatch at 304, metadata: 224, body: 56
> >
> >
> > However, when I wrap that payload into a ByteArrayReadableSeekableByteC
> hannel
> > and use ArrowFileReader (along with a BufferAllocator) to read it,
> > ArrowFileReader is complaining that it's reading an invalid format, right
> > at the point where I call reader.getVectorSchemaRoot():
> >
> > Exception in thread "main" org.apache.arrow.vector.file.
> InvalidArrowFileException:
> > missing Magic number [0, 0, 42, 0, 0, 0, 0, 0, 0, 0]
> >  at 

[jira] [Created] (ARROW-1620) Python: Download Boost in manylinux1 build from bintray

2017-09-27 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-1620:
--

 Summary: Python: Download Boost in manylinux1 build from bintray
 Key: ARROW-1620
 URL: https://issues.apache.org/jira/browse/ARROW-1620
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Uwe L. Korn
 Fix For: 0.8.0


Sourceforge often fails, so use the alternative source. See also 
https://github.com/ray-project/ray/pull/1019 for this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: ArrowFileReader failing to read bytes written to Java output stream

2017-09-27 Thread Andrew Pham (BLOOMBERG/ 731 LEX)
Also for reference, this is apparently Arrow Schema used by the ArrowFileWriter 
to write to the output stream (given by root.getSchema().toString() and 
root.getSchema().toJson()):  

Schema
{
  "fields" : [ {
"name" : "price",
"nullable" : true,
"type" : {
  "name" : "floatingpoint",
  "precision" : "DOUBLE"
},
"children" : [ ],
"typeLayout" : {
  "vectors" : [ {
"type" : "VALIDITY",
"typeBitWidth" : 1
  }, {
"type" : "DATA",
"typeBitWidth" : 64
  } ]
}
  }, {
"name" : "numShares",
"nullable" : true,
"type" : {
  "name" : "int",
  "bitWidth" : 32,
  "isSigned" : true
},
"children" : [ ],
"typeLayout" : {
  "vectors" : [ {
"type" : "VALIDITY",
"typeBitWidth" : 1
  }, {
"type" : "DATA",
"typeBitWidth" : 32
  } ]
}
  } ]
}


Given our bytes (wrapped by a SeekableByteChannel), the reader is unable to 
obtain the schema from this.  Any ideas as to what could be happening?  Cheers!

From: dev@arrow.apache.org At: 09/26/17 18:59:18To:  Andrew Pham (BLOOMBERG/ 
731 LEX ) ,  dev@arrow.apache.org
Subject: Re: ArrowFileReader failing to read bytes written to Java output stream

Andrew,

Seems like it fails to read the schema. It has reached the data part yet.
Can you share your reader/writer code?

On Tue, Sep 26, 2017 at 6:37 PM, Andrew Pham (BLOOMBERG/ 731 LEX) <
apha...@bloomberg.net> wrote:

> Hello there, I've written something that behaves similarly to:
>
> https://github.com/apache/spark/blob/master/sql/core/
> src/main/scala/org/apache/spark/sql/execution/arrow/
> ArrowConverters.scala#L73
>
> Except that for proof of concept purposes, it transforms Java objects with
> data into a byte[] payload.  The ArrowFileWriter log statements indicate
> that data is getting written to the output stream:
>
> 17:53:16.759 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> Writing buffer with size: 6
> 17:53:16.759 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> Writing buffer with size: 2
> 17:53:16.766 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> Writing buffer with size: 4
> 17:53:16.766 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> Writing buffer with size: 288
> 17:53:16.766 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> Writing buffer with size: 4
> 17:53:16.769 [main] DEBUG org.apache.arrow.vector.schema.ArrowRecordBatch
> - Buffer in RecordBatch at 0, length: 1
> 17:53:16.769 [main] DEBUG org.apache.arrow.vector.schema.ArrowRecordBatch
> - Buffer in RecordBatch at 8, length: 24
> 17:53:16.770 [main] DEBUG org.apache.arrow.vector.schema.ArrowRecordBatch
> - Buffer in RecordBatch at 32, length: 1
> 17:53:16.770 [main] DEBUG org.apache.arrow.vector.schema.ArrowRecordBatch
> - Buffer in RecordBatch at 40, length: 12
> 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> Writing buffer with size: 4
> 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> Writing buffer with size: 216
> 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> Writing buffer with size: 4
> 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> Writing buffer with size: 1
> 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> Writing buffer with size: 7
> 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> Writing buffer with size: 24
> 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> Writing buffer with size: 1
> 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> Writing buffer with size: 7
> 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> Writing buffer with size: 12
> 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> Writing buffer with size: 4
> 17:53:16.772 [main] DEBUG org.apache.arrow.vector.file.ArrowWriter -
> RecordBatch at 304, metadata: 224, body: 56
>
>
> However, when I wrap that payload into a ByteArrayReadableSeekableByteChannel
> and use ArrowFileReader (along with a BufferAllocator) to read it,
> ArrowFileReader is complaining that it's reading an invalid format, right
> at the point where I call reader.getVectorSchemaRoot():
>
> Exception in thread "main" 
> org.apache.arrow.vector.file.InvalidArrowFileException:
> missing Magic number [0, 0, 42, 0, 0, 0, 0, 0, 0, 0]
>  at org.apache.arrow.vector.file.ArrowFileReader.readSchema(
> ArrowFileReader.java:66)
>   at org.apache.arrow.vector.file.ArrowFileReader.readSchema(
> ArrowFileReader.java:37)
>   at org.apache.arrow.vector.file.ArrowReader.initialize(
> ArrowReader.java:162)
>  at org.apache.arrow.vector.file.ArrowReader.ensureInitialized(
> ArrowReader.java:153)
>   at org.apache.arrow.vector.file.ArrowReader.getVectorSchemaRoot(
> ArrowReader.java:67)
>  at com.bloomberg.andrew.sql.execution.arrow.ArrowConverters.
> 

[jira] [Created] (ARROW-1619) [Java] Correctly set "lastSet" for variable vectors in JsonReader

2017-09-27 Thread Bryan Cutler (JIRA)
Bryan Cutler created ARROW-1619:
---

 Summary: [Java] Correctly set "lastSet" for variable vectors in 
JsonReader
 Key: ARROW-1619
 URL: https://issues.apache.org/jira/browse/ARROW-1619
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java - Vectors
Reporter: Bryan Cutler
Assignee: Bryan Cutler


The Arrow Java JsonFileReader does not correctly set "lastSet" in 
VariableWidthVectors which makes reading inner vectors overly complicated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1618) See if the heap usage in vectors can be reduced.

2017-09-27 Thread Siddharth Teotia (JIRA)
Siddharth Teotia created ARROW-1618:
---

 Summary: See if the heap usage in vectors can be reduced.
 Key: ARROW-1618
 URL: https://issues.apache.org/jira/browse/ARROW-1618
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java - Memory, Java - Vectors
Reporter: Siddharth Teotia
Assignee: Siddharth Teotia


We have seen in our tests that there is some scope of improvement as far as the 
number of objects and/or sizing of some data structures is concerned.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: [VOTE] Release Apache Arrow 0.7.1 - RC1

2017-09-27 Thread Wes McKinney
+1 (binding)

using dev/release/verify-release-candidate.sh I

* Verified signature, checksum on Linux
* Ran C++, Python (+ Parquet support), C, Java, JS (node 6.11.3) unit tests

using dev/release/verify-release-candidate.bat on Windows / Visual Studio 2015 *

* Ran C++ and Python unit tests (+ Parquet support)

@Kou, we might want to add "cold start" instructions for people who
want to use the release verification script, and who may not have the
requisite libraries (Ruby and system C libraries) to build the GLib
bindings and run the unit tests.

On Wed, Sep 27, 2017 at 10:01 AM, Wes McKinney  wrote:
> Hello all,
>
> I'd like to propose the 2nd release candidate (rc1) of Apache
> Arrow version 0.7.1.  This is a bugfix release from 0.7.0. The only
> difference between rc1 and rc0 was fixing an issue with the source
> release build for Windows users.
>
> The source release rc1 is hosted at [1].
>
> This release candidate is based on commit
> 0e21f84c2fc26dba949a03ee7d7ebfade0a65b81 [2]
>
> The changelog is located at [3].
>
> Please download, verify checksums and signatures, run the unit
> tests, and vote on the release. Consider using the release
> verification scripts in [4].
>
> The vote will be open for at least 72 hours.
>
> [ ] +1 Release this as Apache Arrow 0.7.1
> [ ] +0
> [ ] -1 Do not release this as Apache Arrow 0.7.1 because...
>
> Thanks,
> Wes
>
> How to validate a release signature:
> https://httpd.apache.org/dev/verification.html
>
> [1]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.7.1-rc1/
> [2]: 
> https://github.com/apache/arrow/tree/0e21f84c2fc26dba949a03ee7d7ebfade0a65b81
> [3]: 
> https://git-wip-us.apache.org/repos/asf?p=arrow.git;a=blob_plain;f=CHANGELOG.md;hb=0e21f84c2fc26dba949a03ee7d7ebfade0a65b81
> [4]: https://github.com/apache/arrow/tree/master/dev/release


[VOTE] Release Apache Arrow 0.7.1 - RC1

2017-09-27 Thread Wes McKinney
Hello all,

I'd like to propose the 2nd release candidate (rc1) of Apache
Arrow version 0.7.1.  This is a bugfix release from 0.7.0. The only
difference between rc1 and rc0 was fixing an issue with the source
release build for Windows users.

The source release rc1 is hosted at [1].

This release candidate is based on commit
0e21f84c2fc26dba949a03ee7d7ebfade0a65b81 [2]

The changelog is located at [3].

Please download, verify checksums and signatures, run the unit
tests, and vote on the release. Consider using the release
verification scripts in [4].

The vote will be open for at least 72 hours.

[ ] +1 Release this as Apache Arrow 0.7.1
[ ] +0
[ ] -1 Do not release this as Apache Arrow 0.7.1 because...

Thanks,
Wes

How to validate a release signature:
https://httpd.apache.org/dev/verification.html

[1]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.7.1-rc1/
[2]: 
https://github.com/apache/arrow/tree/0e21f84c2fc26dba949a03ee7d7ebfade0a65b81
[3]: 
https://git-wip-us.apache.org/repos/asf?p=arrow.git;a=blob_plain;f=CHANGELOG.md;hb=0e21f84c2fc26dba949a03ee7d7ebfade0a65b81
[4]: https://github.com/apache/arrow/tree/master/dev/release


Re: ArrowFileReader failing to read bytes written to Java output stream

2017-09-27 Thread Andrew Pham (BLOOMBERG/ 731 LEX)
Thanks for taking a look!  I've attached snippets of the code.  I also suspect 
it has something to do with the way I'm defining the schema.  I should note 
especially that the helper classes (the ArrowWriter and ArrowUtils) currently 
do not support transformations for array/struct data types; only basic object 
types that have primitives.  For now, I'm trying to serialize/send over a list 
of objects that consist of an integer and a double field/member.

Please let me know if there's anything else you'd like to see.  The current way 
I'm trying to construct arrow schema objects is by attempting to transform 
arbitrary Class types via reflection.  At the very least, the writer seems 
to be writing stuff.  

Any insights into this problem would be helpful, thank you

From: dev@arrow.apache.org At: 09/26/17 18:59:18To:  Andrew Pham (BLOOMBERG/ 
731 LEX ) ,  dev@arrow.apache.org
Subject: Re: ArrowFileReader failing to read bytes written to Java output stream

Andrew,

Seems like it fails to read the schema. It has reached the data part yet.
Can you share your reader/writer code?

On Tue, Sep 26, 2017 at 6:37 PM, Andrew Pham (BLOOMBERG/ 731 LEX) <
apha...@bloomberg.net> wrote:

> Hello there, I've written something that behaves similarly to:
>
> https://github.com/apache/spark/blob/master/sql/core/
> src/main/scala/org/apache/spark/sql/execution/arrow/
> ArrowConverters.scala#L73
>
> Except that for proof of concept purposes, it transforms Java objects with
> data into a byte[] payload.  The ArrowFileWriter log statements indicate
> that data is getting written to the output stream:
>
> 17:53:16.759 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> Writing buffer with size: 6
> 17:53:16.759 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> Writing buffer with size: 2
> 17:53:16.766 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> Writing buffer with size: 4
> 17:53:16.766 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> Writing buffer with size: 288
> 17:53:16.766 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> Writing buffer with size: 4
> 17:53:16.769 [main] DEBUG org.apache.arrow.vector.schema.ArrowRecordBatch
> - Buffer in RecordBatch at 0, length: 1
> 17:53:16.769 [main] DEBUG org.apache.arrow.vector.schema.ArrowRecordBatch
> - Buffer in RecordBatch at 8, length: 24
> 17:53:16.770 [main] DEBUG org.apache.arrow.vector.schema.ArrowRecordBatch
> - Buffer in RecordBatch at 32, length: 1
> 17:53:16.770 [main] DEBUG org.apache.arrow.vector.schema.ArrowRecordBatch
> - Buffer in RecordBatch at 40, length: 12
> 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> Writing buffer with size: 4
> 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> Writing buffer with size: 216
> 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> Writing buffer with size: 4
> 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> Writing buffer with size: 1
> 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> Writing buffer with size: 7
> 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> Writing buffer with size: 24
> 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> Writing buffer with size: 1
> 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> Writing buffer with size: 7
> 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> Writing buffer with size: 12
> 17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel -
> Writing buffer with size: 4
> 17:53:16.772 [main] DEBUG org.apache.arrow.vector.file.ArrowWriter -
> RecordBatch at 304, metadata: 224, body: 56
>
>
> However, when I wrap that payload into a ByteArrayReadableSeekableByteChannel
> and use ArrowFileReader (along with a BufferAllocator) to read it,
> ArrowFileReader is complaining that it's reading an invalid format, right
> at the point where I call reader.getVectorSchemaRoot():
>
> Exception in thread "main" 
> org.apache.arrow.vector.file.InvalidArrowFileException:
> missing Magic number [0, 0, 42, 0, 0, 0, 0, 0, 0, 0]
>  at org.apache.arrow.vector.file.ArrowFileReader.readSchema(
> ArrowFileReader.java:66)
>   at org.apache.arrow.vector.file.ArrowFileReader.readSchema(
> ArrowFileReader.java:37)
>   at org.apache.arrow.vector.file.ArrowReader.initialize(
> ArrowReader.java:162)
>  at org.apache.arrow.vector.file.ArrowReader.ensureInitialized(
> ArrowReader.java:153)
>   at org.apache.arrow.vector.file.ArrowReader.getVectorSchemaRoot(
> ArrowReader.java:67)
>  at com.bloomberg.andrew.sql.execution.arrow.ArrowConverters.
> byteArrayToBatch(ArrowConverters.java:89)
> at com.bloomberg.andrew.sql.execution.arrow.ArrowPayload.
> loadBatch(ArrowPayload.java:18)
>  at com.bloomberg.andrew.test.arrow.ArrowPublisher.main(
> ArrowPublisher.java:28)
>
>
> I'm noticing that the number 42 is exactly the same as the value of the

Re: [VOTE] Release Apache Arrow 0.7.1 - RC0

2017-09-27 Thread Wes McKinney
-1 (binding)

I verified the release on Linux (Ubuntu 14.04) with

dev/release/verify-release-candidate.sh 0.7.1 0

and it passed perfectly (verifying C++, Python, C / GLib, Java, JS). I
suggest others try the script out for this release

Unfortunately, it turns out that for the source release script to work
correctly on Windows, and the setting

git config core.symlinks true

must be set or symlinked files do not get replaced with their
contents. This is brittleness that we should fix (by consolidating all
CMake modules in a single directory). \

I will send out a new RC (after I test it on Windows, of course). see
https://issues.apache.org/jira/browse/ARROW-1617

- Wes

On Wed, Sep 27, 2017 at 12:28 AM, Wes McKinney  wrote:
> Hello all,
>
> I'd like to propose the 1st release candidate (rc0) of Apache
> Arrow version 0.7.1.  This is a bugfix release from 0.7.0.
>
> The source release rc0 is hosted at [1].
>
> This release candidate is based on commit
> 6354053e39dbed5b5317a5f4070f366833e9544d [2]
>
> The changelog is located at [3].
>
> Please download, verify checksums and signatures, run the unit
> tests, and vote on the release. Consider using the release
> verification scripts in [4].
>
> The vote will be open for at least 72 hours.
>
> [ ] +1 Release this as Apache Arrow 0.7.1
> [ ] +0
> [ ] -1 Do not release this as Apache Arrow 0.7.1 because...
>
> Thanks,
> Wes
>
> How to validate a release signature:
> https://httpd.apache.org/dev/verification.html
>
> [1]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.7.1-rc0/
> [2]: 
> https://github.com/apache/arrow/tree/6354053e39dbed5b5317a5f4070f366833e9544d
> [3]: 
> https://git-wip-us.apache.org/repos/asf?p=arrow.git;a=blob_plain;f=CHANGELOG.md;hb=6354053e39dbed5b5317a5f4070f366833e9544d
> [4]: https://github.com/apache/arrow/tree/master/dev/release


[jira] [Created] (ARROW-1617) [Python] Do not use symlinks in python/cmake_modules

2017-09-27 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1617:
---

 Summary: [Python] Do not use symlinks in python/cmake_modules 
 Key: ARROW-1617
 URL: https://issues.apache.org/jira/browse/ARROW-1617
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Wes McKinney
 Fix For: 0.8.0


This requires that {{git config core.symlinks true}} be set, which makes 
development and source releases (on Linux/macOS even) more brittle



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)