[jira] [Commented] (ARROW-1577) [JS] Package release script for NPM modules

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262067#comment-16262067
 ] 

ASF GitHub Bot commented on ARROW-1577:
---

trxcllnt commented on issue #1346: ARROW-1577: [JS] add ASF release scripts
URL: https://github.com/apache/arrow/pull/1346#issuecomment-346261217
 
 
   @wesm ok, I added `dev/release/js-source-release.sh` and `js/npm-release.sh` 
scripts:
   
   ```sh
   cd $ARROW_HOME
   # create a release branch (assuming patch release 0.1.2 -> 0.1.3)
   git co -b release-js-0_1_3
   # create a new v0.1.3 tag
   git tag -a apache-arrow-js-0.1.3 -m "v0.1.3"
   ./dev/release/js-source-release.sh
   # > Usage: ./dev/release/js-source-release.sh   

   
   # - runs `npm version 0.1.3 --no-git-tag-version`
   # - git-commits the new package.json@0.1.3
   # - creates a tarball at dev/release/js-tmp/apache-arrow-js-0.1.3.tar.gz
   # - copies signed tarballs to 
dev/release/js-tmp/js-rc-tmp/apache-arrow-js-0.1.3-rc0
   # - svn commits the dev/release/js-tmp/js-rc-tmp/apache-arrow-js-0.1.3-rc0 
directory
   
   # js-version 0.1.3, arrow-version 0.8.0, rc 0
   ./dev/release/js-source-release.sh 0.1.3 0.8.0 0
   ```
   
   And when we're ready to publish the modules to npm, we can run:
   ```sh
   cd apache-arrow-js-0.1.3-rc0
   tar -xzf *.tar.gz
   cd apache-arrow-js-0.1.3
   # provide the same arrow-version as `js-source-release.sh`,
   # so we can add an npm dist-tag that indicates which Arrow
   # release each npm version is compatible with
   npm run release -- 0.8.0
   # available at either
   npm install apache-arrow@0.1.3
   # or 
   npm install apache-arrow@v0.8.0
   ```
   
   At the moment the `npm-release.sh` script [executes the integration 
tests](https://github.com/apache/arrow/pull/1346/commits/c9cb2e04ec4a85fe1e49d1c023b15fceb3598952#diff-a8aff12ce8d320cec92901f4a8129ed3R27)
 before publishing.
   
   If the `npm run release` step is run with the [integration test 
envars](https://github.com/trxcllnt/arrow/blob/js-asf-release-scripts/js/gulp/test-task.js#L66)
 available (or at the same directory level as the `arrow/js`), it should work 
fine. If thats's too much of a hassle, feel free to comment them out:
   ```sh
   # validate the targets pass all tests before publishing
   npm install
   # npx run-s clean:all lint create:testdata build
   # npm run test -- -t ts -u --integration
   # npm run test -- --integration
   npx run-s clean:all lint build
   npm run test
   # ...
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Package release script for NPM modules
> ---
>
> Key: ARROW-1577
> URL: https://issues.apache.org/jira/browse/ARROW-1577
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Affects Versions: 0.8.0
>Reporter: Wes McKinney
>Assignee: Paul Taylor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Since the NPM JavaScript module may wish to release more frequently than the 
> main Arrow "monorepo", we should create a script to produce signed NPM 
> artifacts to use for voting:
> * Update metadata for new version
> * Run unit tests
> * Create package tarballs with NPM
> * GPG sign and create md5 and sha512 checksum files
> * Upload to Apache dev SVN
> i.e. like 
> https://github.com/apache/arrow/blob/master/dev/release/02-source.sh, but 
> only for JavaScript.
> We will also want to write instructions for Arrow developers to verify the 
> tarballs to streamline the release votes



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1577) [JS] Package release script for NPM modules

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262065#comment-16262065
 ] 

ASF GitHub Bot commented on ARROW-1577:
---

trxcllnt commented on issue #1346: ARROW-1577: [JS] add ASF release scripts
URL: https://github.com/apache/arrow/pull/1346#issuecomment-346261217
 
 
   @wesm ok, I added `dev/release/js-source-release.sh` and `js/npm-release.sh` 
scripts:
   
   ```sh
   cd $ARROW_HOME
   # create a release branch (assuming patch release 0.1.2 -> 0.1.3)
   git co -b release-js-0_1_3
   # create a new v0.1.3 tag
   git tag -a apache-arrow-js-0.1.3 -m "v0.1.3"
   ./dev/release/js-source-release.sh
   # > Usage: ./dev/release/js-source-release.sh   

   
   # - runs `npm version 0.1.3 --no-git-tag-version`
   # - git-commits the new package.json@0.1.3
   # - creates a tarball at dev/release/js-tmp/apache-arrow-js-0.1.3.tar.gz
   # - copies signed tarballs to 
dev/release/js-tmp/js-rc-tmp/apache-arrow-js-0.1.3-rc0
   # - svn commits the dev/release/js-tmp/js-rc-tmp/apache-arrow-js-0.1.3-rc0 
directory
   
   # js-version 0.1.3, arrow-version 0.8.0, rc 0
   ./dev/release/js-source-release.sh 0.1.3 0.8.0 0
   ```
   
   And when we're ready to publish the modules to npm, we can run:
   ```sh
   cd apache-arrow-js-0.1.3-rc0
   tar -xzf *.tar.gz
   cd apache-arrow-js-0.1.3
   # provide the same arrow-version as `js-source-release.sh`,
   # so we can add an npm dist-tag that indicates which Arrow
# release each npm version is compatible with
   npm run release -- 0.8.0
   # available at either
   npm install apache-arrow@0.1.3
   # or 
   npm install apache-arrow@v0.8.0
   ```
   
   At the moment the `npm-release.sh` script [executes the integration 
tests](https://github.com/apache/arrow/pull/1346/commits/c9cb2e04ec4a85fe1e49d1c023b15fceb3598952#diff-a8aff12ce8d320cec92901f4a8129ed3R27)
 before publishing.
   
   If the `npm run release` step is run with the [integration test 
envars](https://github.com/trxcllnt/arrow/blob/js-asf-release-scripts/js/gulp/test-task.js#L66)
 available (or at the same directory level as the `arrow/js`), it should work 
fine. If thats's too much of a hassle, feel free to comment them out:
   ```sh
   # validate the targets pass all tests before publishing
   npm install
   # npx run-s clean:all lint create:testdata build
   # npm run test -- -t ts -u --integration
   # npm run test -- --integration
   npx run-s clean:all lint build
   npm run test
   # ...
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Package release script for NPM modules
> ---
>
> Key: ARROW-1577
> URL: https://issues.apache.org/jira/browse/ARROW-1577
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Affects Versions: 0.8.0
>Reporter: Wes McKinney
>Assignee: Paul Taylor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Since the NPM JavaScript module may wish to release more frequently than the 
> main Arrow "monorepo", we should create a script to produce signed NPM 
> artifacts to use for voting:
> * Update metadata for new version
> * Run unit tests
> * Create package tarballs with NPM
> * GPG sign and create md5 and sha512 checksum files
> * Upload to Apache dev SVN
> i.e. like 
> https://github.com/apache/arrow/blob/master/dev/release/02-source.sh, but 
> only for JavaScript.
> We will also want to write instructions for Arrow developers to verify the 
> tarballs to streamline the release votes



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1577) [JS] Package release script for NPM modules

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262059#comment-16262059
 ] 

ASF GitHub Bot commented on ARROW-1577:
---

trxcllnt commented on issue #1346: ARROW-1577: [JS] add ASF release scripts
URL: https://github.com/apache/arrow/pull/1346#issuecomment-346261217
 
 
   @wesm ok, I added `dev/release/js-source-release.sh` and `js/npm-release.sh` 
scripts:
   
   ```sh
   cd $ARROW_HOME
   # create a release branch (assuming patch release 0.1.2 -> 0.1.3)
   git co -b release-js-0_1_3
   # create a new v0.1.3 tag
   git tag -a apache-arrow-js-0.1.3 -m "v0.1.3"
   ./dev/release/js-source-release.sh
   # > Usage: ./dev/release/js-source-release.sh   

   
   # - runs `npm version 0.1.3 --no-git-tag-version`
   # - git-commits the new package.json@0.1.3
   # - creates a tarball at dev/release/js-tmp/apache-arrow-js-0.1.3.tar.gz
   # - copies signed tarballs to 
dev/release/js-tmp/js-rc-tmp/apache-arrow-js-0.1.3-rc0
   # - svn commits the dev/release/js-tmp/js-rc-tmp/apache-arrow-js-0.1.3-rc0 
directory
   
   # js-version 0.1.3, arrow-version 0.8.0, rc 0
   ./dev/release/js-source-release.sh 0.1.3 0.8.0 0
   ```
   
   And when we're ready to publish the modules to npm, we can run:
   ```sh
   cd apache-arrow-js-0.1.3-rc0
   tar -xzf *.tar.gz
   cd apache-arrow-js-0.1.3
   # provide the same arrow-version so we can add an npm dist-tag to indicate 
which versions of Arrow the JS lib is compatible with
   npm run release -- 0.8.0
   # available at either
   npm install apache-arrow@0.1.3
   # or 
   npm install apache-arrow@v0.8.0
   ```
   
   At the moment the `npm-release.sh` script [executes the integration 
tests](https://github.com/apache/arrow/pull/1346/commits/c9cb2e04ec4a85fe1e49d1c023b15fceb3598952#diff-a8aff12ce8d320cec92901f4a8129ed3R27)
 before publishing.
   
   If the `npm run release` step is run with the [integration test 
envars](https://github.com/trxcllnt/arrow/blob/js-asf-release-scripts/js/gulp/test-task.js#L66)
 available (or at the same directory level as the `arrow/js`), it should work 
fine. If thats's too much of a hassle, feel free to comment them out:
   ```sh
   # validate the targets pass all tests before publishing
   npm install
   # npx run-s clean:all lint create:testdata build
   # npm run test -- -t ts -u --integration
   # npm run test -- --integration
   npx run-s clean:all lint build
   npm run test
   # ...
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Package release script for NPM modules
> ---
>
> Key: ARROW-1577
> URL: https://issues.apache.org/jira/browse/ARROW-1577
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Affects Versions: 0.8.0
>Reporter: Wes McKinney
>Assignee: Paul Taylor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Since the NPM JavaScript module may wish to release more frequently than the 
> main Arrow "monorepo", we should create a script to produce signed NPM 
> artifacts to use for voting:
> * Update metadata for new version
> * Run unit tests
> * Create package tarballs with NPM
> * GPG sign and create md5 and sha512 checksum files
> * Upload to Apache dev SVN
> i.e. like 
> https://github.com/apache/arrow/blob/master/dev/release/02-source.sh, but 
> only for JavaScript.
> We will also want to write instructions for Arrow developers to verify the 
> tarballs to streamline the release votes



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1795) [Plasma C++] change evict policy

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262021#comment-16262021
 ] 

ASF GitHub Bot commented on ARROW-1795:
---

luchy0120 commented on issue #1327: ARROW-1795: [Plasma] Create flag to make 
Plasma store use a single memory-mapped file.
URL: https://github.com/apache/arrow/pull/1327#issuecomment-346253309
 
 
   @pcmoritz , thanks for the explanation , I'll do more testing.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Plasma C++] change evict policy
> 
>
> Key: ARROW-1795
> URL: https://issues.apache.org/jira/browse/ARROW-1795
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Reporter: Lu Qi 
>Assignee: Robert Nishihara
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> case 1.say, we have total free memory 8 G , we have input 5G data, then comes 
> another 6G data, 
> if we choose to evict space 6G , it will throw exception saying that
> no object can be free. This is because we didn't count the 3G remaining free
> space .If we count this remaining 3G , we need to ask only 3G,thus
> we can evict the 5G data and we are still alive . 
> case 2. another situation is :  if we have free memory 10G , we input 1.5G 
> data ,then comes another
> 9G data , if we use  10*20% = 2G data to evict ,then we will crash . In this 
> situation we need to 
> use 9+1.5-10 = 0.5G data to evict  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1816) [Java] Resolve new vector classes structure for timestamp, date and maybe interval

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261939#comment-16261939
 ] 

ASF GitHub Bot commented on ARROW-1816:
---

icexelloss commented on issue #1330: wip: ARROW-1816: [Java] Resolve new vector 
classes structure for timestamp, date and maybe interval
URL: https://github.com/apache/arrow/pull/1330#issuecomment-346236125
 
 
   @BryanCutler I am a bit reluctant to check `unit` and `timezone` if value 
holders because of performance reasons. This is currently not checked with 
other type with type params either, such as decimal. We should maybe visit this 
problem as a whole as follow up?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Resolve new vector classes structure for timestamp, date and maybe 
> interval
> --
>
> Key: ARROW-1816
> URL: https://issues.apache.org/jira/browse/ARROW-1816
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Li Jin
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Personally I think having 8 vector classes for timestamps is not great. This 
> is discussed at some point during the PR:
> https://github.com/apache/arrow/pull/1203#discussion_r145241388



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1816) [Java] Resolve new vector classes structure for timestamp, date and maybe interval

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261941#comment-16261941
 ] 

ASF GitHub Bot commented on ARROW-1816:
---

icexelloss commented on issue #1330: wip: ARROW-1816: [Java] Resolve new vector 
classes structure for timestamp, date and maybe interval
URL: https://github.com/apache/arrow/pull/1330#issuecomment-346236125
 
 
   @BryanCutler I am a bit reluctant to check `unit` and `timezone` i  value 
holders because of performance reasons. This is currently not checked with 
other type with type params either, such as decimal. We should maybe visit this 
problem as a whole as follow up?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Resolve new vector classes structure for timestamp, date and maybe 
> interval
> --
>
> Key: ARROW-1816
> URL: https://issues.apache.org/jira/browse/ARROW-1816
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Li Jin
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Personally I think having 8 vector classes for timestamps is not great. This 
> is discussed at some point during the PR:
> https://github.com/apache/arrow/pull/1203#discussion_r145241388



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261934#comment-16261934
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

icexelloss commented on issue #1259: ARROW-1047: [Java] Add Generic Reader 
Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#issuecomment-346235425
 
 
   This looks good to me. Once the package name hierarchy I think this should 
be good to go. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1577) [JS] Package release script for NPM modules

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261929#comment-16261929
 ] 

ASF GitHub Bot commented on ARROW-1577:
---

trxcllnt opened a new pull request #1346: ARROW-1577: [JS] add ASF release 
scripts
URL: https://github.com/apache/arrow/pull/1346
 
 
   @wesm does 
[this](https://github.com/apache/arrow/commit/b270dbad4f4fd70e19613b93148ec2ae3596e1fd#diff-fc8acbd4f42fb5e6b0cad14928b68115R59)
 look good


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Package release script for NPM modules
> ---
>
> Key: ARROW-1577
> URL: https://issues.apache.org/jira/browse/ARROW-1577
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Affects Versions: 0.8.0
>Reporter: Wes McKinney
>Assignee: Paul Taylor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Since the NPM JavaScript module may wish to release more frequently than the 
> main Arrow "monorepo", we should create a script to produce signed NPM 
> artifacts to use for voting:
> * Update metadata for new version
> * Run unit tests
> * Create package tarballs with NPM
> * GPG sign and create md5 and sha512 checksum files
> * Upload to Apache dev SVN
> i.e. like 
> https://github.com/apache/arrow/blob/master/dev/release/02-source.sh, but 
> only for JavaScript.
> We will also want to write instructions for Arrow developers to verify the 
> tarballs to streamline the release votes



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1577) [JS] Package release script for NPM modules

2017-11-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-1577:
--
Labels: pull-request-available  (was: )

> [JS] Package release script for NPM modules
> ---
>
> Key: ARROW-1577
> URL: https://issues.apache.org/jira/browse/ARROW-1577
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Affects Versions: 0.8.0
>Reporter: Wes McKinney
>Assignee: Paul Taylor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Since the NPM JavaScript module may wish to release more frequently than the 
> main Arrow "monorepo", we should create a script to produce signed NPM 
> artifacts to use for voting:
> * Update metadata for new version
> * Run unit tests
> * Create package tarballs with NPM
> * GPG sign and create md5 and sha512 checksum files
> * Upload to Apache dev SVN
> i.e. like 
> https://github.com/apache/arrow/blob/master/dev/release/02-source.sh, but 
> only for JavaScript.
> We will also want to write instructions for Arrow developers to verify the 
> tarballs to streamline the release votes



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (ARROW-1828) [C++] Implement hash kernel specialization for BooleanType

2017-11-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-1828:
---

Assignee: Wes McKinney

> [C++] Implement hash kernel specialization for BooleanType
> --
>
> Key: ARROW-1828
> URL: https://issues.apache.org/jira/browse/ARROW-1828
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
> Fix For: 0.8.0
>
>
> Follow up to ARROW-1559



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261761#comment-16261761
 ] 

ASF GitHub Bot commented on ARROW-1710:
---

BryanCutler commented on a change in pull request #1341: [WIP] ARROW-1710: 
[Java] Remove Non-Nullable Vectors
URL: https://github.com/apache/arrow/pull/1341#discussion_r152441194
 
 

 ##
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/complex/MapVector.java
 ##
 @@ -21,332 +21,492 @@
 import static com.google.common.base.Preconditions.checkNotNull;
 
 import java.util.ArrayList;
-import java.util.Collection;
-import java.util.Iterator;
+import java.util.Arrays;
+import java.util.Collections;
 import java.util.List;
-import java.util.Map;
 
-import javax.annotation.Nullable;
-
-import com.google.common.base.Preconditions;
-import com.google.common.collect.Ordering;
-import com.google.common.primitives.Ints;
+import com.google.common.collect.ObjectArrays;
 
 import io.netty.buffer.ArrowBuf;
-
+import org.apache.arrow.memory.BaseAllocator;
 import org.apache.arrow.memory.BufferAllocator;
 import org.apache.arrow.vector.*;
-import org.apache.arrow.vector.complex.impl.SingleMapReaderImpl;
-import org.apache.arrow.vector.complex.reader.FieldReader;
+import org.apache.arrow.vector.complex.impl.NullableMapReaderImpl;
+import org.apache.arrow.vector.complex.impl.NullableMapWriter;
 import org.apache.arrow.vector.holders.ComplexHolder;
-import org.apache.arrow.vector.types.Types.MinorType;
+import org.apache.arrow.vector.schema.ArrowFieldNode;
 import org.apache.arrow.vector.types.pojo.ArrowType;
-import org.apache.arrow.vector.types.pojo.Field;
+import org.apache.arrow.vector.types.pojo.ArrowType.Struct;
+import org.apache.arrow.vector.types.pojo.DictionaryEncoding;
 import org.apache.arrow.vector.types.pojo.FieldType;
+import org.apache.arrow.vector.types.pojo.Field;
 import org.apache.arrow.vector.util.CallBack;
-import org.apache.arrow.vector.util.JsonStringHashMap;
+import org.apache.arrow.vector.util.OversizedAllocationException;
 import org.apache.arrow.vector.util.TransferPair;
 
-public class MapVector extends AbstractMapVector {
-  //private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(MapVector.class);
+public class MapVector extends NonNullableMapVector implements FieldVector {
 
 Review comment:
   Yes, the above is correct and what I was referencing in (1) from 
https://github.com/apache/arrow/pull/1341#issuecomment-345871911.  Maybe I 
missed the discussion, so I thought we would be removing the 
`NonNullableMapVector` and would combine the 2 classes.  Why do we need to keep 
both?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Decide what to do with non-nullable vectors in new vector class 
> hierarchy 
> -
>
> Key: ARROW-1710
> URL: https://issues.apache.org/jira/browse/ARROW-1710
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java - Vectors
>Reporter: Li Jin
>Assignee: Bryan Cutler
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> So far the consensus seems to be remove all non-nullable vectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261754#comment-16261754
 ] 

ASF GitHub Bot commented on ARROW-1710:
---

siddharthteotia commented on a change in pull request #1341: [WIP] ARROW-1710: 
[Java] Remove Non-Nullable Vectors
URL: https://github.com/apache/arrow/pull/1341#discussion_r152440144
 
 

 ##
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/complex/MapVector.java
 ##
 @@ -21,332 +21,492 @@
 import static com.google.common.base.Preconditions.checkNotNull;
 
 import java.util.ArrayList;
-import java.util.Collection;
-import java.util.Iterator;
+import java.util.Arrays;
+import java.util.Collections;
 import java.util.List;
-import java.util.Map;
 
-import javax.annotation.Nullable;
-
-import com.google.common.base.Preconditions;
-import com.google.common.collect.Ordering;
-import com.google.common.primitives.Ints;
+import com.google.common.collect.ObjectArrays;
 
 import io.netty.buffer.ArrowBuf;
-
+import org.apache.arrow.memory.BaseAllocator;
 import org.apache.arrow.memory.BufferAllocator;
 import org.apache.arrow.vector.*;
-import org.apache.arrow.vector.complex.impl.SingleMapReaderImpl;
-import org.apache.arrow.vector.complex.reader.FieldReader;
+import org.apache.arrow.vector.complex.impl.NullableMapReaderImpl;
+import org.apache.arrow.vector.complex.impl.NullableMapWriter;
 import org.apache.arrow.vector.holders.ComplexHolder;
-import org.apache.arrow.vector.types.Types.MinorType;
+import org.apache.arrow.vector.schema.ArrowFieldNode;
 import org.apache.arrow.vector.types.pojo.ArrowType;
-import org.apache.arrow.vector.types.pojo.Field;
+import org.apache.arrow.vector.types.pojo.ArrowType.Struct;
+import org.apache.arrow.vector.types.pojo.DictionaryEncoding;
 import org.apache.arrow.vector.types.pojo.FieldType;
+import org.apache.arrow.vector.types.pojo.Field;
 import org.apache.arrow.vector.util.CallBack;
-import org.apache.arrow.vector.util.JsonStringHashMap;
+import org.apache.arrow.vector.util.OversizedAllocationException;
 import org.apache.arrow.vector.util.TransferPair;
 
-public class MapVector extends AbstractMapVector {
-  //private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(MapVector.class);
+public class MapVector extends NonNullableMapVector implements FieldVector {
 
 Review comment:
   This is slightly different from changing the name of NullableInt to Int 
since we are going to remove the latter one. 
   
   For the MapVector case, I don't think so we are removing anything and 
instead keeping around both -- base class MapVector and subclass 
NullableMapVector. So probably no need to change name here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Decide what to do with non-nullable vectors in new vector class 
> hierarchy 
> -
>
> Key: ARROW-1710
> URL: https://issues.apache.org/jira/browse/ARROW-1710
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java - Vectors
>Reporter: Li Jin
>Assignee: Bryan Cutler
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> So far the consensus seems to be remove all non-nullable vectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261747#comment-16261747
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

icexelloss commented on a change in pull request #1259: ARROW-1047: [Java] Add 
Generic Reader Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#discussion_r152439921
 
 

 ##
 File path: java/tools/src/main/java/org/apache/arrow/tools/EchoServer.java
 ##
 @@ -23,8 +23,8 @@
 import org.apache.arrow.memory.BufferAllocator;
 import org.apache.arrow.memory.RootAllocator;
 import org.apache.arrow.vector.VectorSchemaRoot;
-import org.apache.arrow.vector.stream.ArrowStreamReader;
-import org.apache.arrow.vector.stream.ArrowStreamWriter;
+import org.apache.arrow.vector.ipc.stream.ArrowStreamReader;
+import org.apache.arrow.vector.ipc.stream.ArrowStreamWriter;
 
 Review comment:
   I prefer `ipc.ArrowStreamReader` to `ipc.stream.ArrowStreamReader`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261744#comment-16261744
 ] 

ASF GitHub Bot commented on ARROW-1710:
---

siddharthteotia commented on a change in pull request #1341: [WIP] ARROW-1710: 
[Java] Remove Non-Nullable Vectors
URL: https://github.com/apache/arrow/pull/1341#discussion_r152439861
 
 

 ##
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/complex/MapVector.java
 ##
 @@ -21,332 +21,492 @@
 import static com.google.common.base.Preconditions.checkNotNull;
 
 import java.util.ArrayList;
-import java.util.Collection;
-import java.util.Iterator;
+import java.util.Arrays;
+import java.util.Collections;
 import java.util.List;
-import java.util.Map;
 
-import javax.annotation.Nullable;
-
-import com.google.common.base.Preconditions;
-import com.google.common.collect.Ordering;
-import com.google.common.primitives.Ints;
+import com.google.common.collect.ObjectArrays;
 
 import io.netty.buffer.ArrowBuf;
-
+import org.apache.arrow.memory.BaseAllocator;
 import org.apache.arrow.memory.BufferAllocator;
 import org.apache.arrow.vector.*;
-import org.apache.arrow.vector.complex.impl.SingleMapReaderImpl;
-import org.apache.arrow.vector.complex.reader.FieldReader;
+import org.apache.arrow.vector.complex.impl.NullableMapReaderImpl;
+import org.apache.arrow.vector.complex.impl.NullableMapWriter;
 import org.apache.arrow.vector.holders.ComplexHolder;
-import org.apache.arrow.vector.types.Types.MinorType;
+import org.apache.arrow.vector.schema.ArrowFieldNode;
 import org.apache.arrow.vector.types.pojo.ArrowType;
-import org.apache.arrow.vector.types.pojo.Field;
+import org.apache.arrow.vector.types.pojo.ArrowType.Struct;
+import org.apache.arrow.vector.types.pojo.DictionaryEncoding;
 import org.apache.arrow.vector.types.pojo.FieldType;
+import org.apache.arrow.vector.types.pojo.Field;
 import org.apache.arrow.vector.util.CallBack;
-import org.apache.arrow.vector.util.JsonStringHashMap;
+import org.apache.arrow.vector.util.OversizedAllocationException;
 import org.apache.arrow.vector.util.TransferPair;
 
-public class MapVector extends AbstractMapVector {
-  //private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(MapVector.class);
+public class MapVector extends NonNullableMapVector implements FieldVector {
 
 Review comment:
   If I understand it correctly, the original MapVector has now become 
NonNullableMapVector and NullableMapVector (subclass of of MapVector) has now 
become MapVector. @BryanCutler  can confirm.
   
   I would prefer to not change the naming here since there are essentially two 
classes -- NullableMapVector and MapVector where the former one is a MapVector 
with a validity buffer.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Decide what to do with non-nullable vectors in new vector class 
> hierarchy 
> -
>
> Key: ARROW-1710
> URL: https://issues.apache.org/jira/browse/ARROW-1710
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java - Vectors
>Reporter: Li Jin
>Assignee: Bryan Cutler
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> So far the consensus seems to be remove all non-nullable vectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261736#comment-16261736
 ] 

ASF GitHub Bot commented on ARROW-1710:
---

icexelloss commented on a change in pull request #1341: [WIP] ARROW-1710: 
[Java] Remove Non-Nullable Vectors
URL: https://github.com/apache/arrow/pull/1341#discussion_r152439296
 
 

 ##
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/complex/MapVector.java
 ##
 @@ -21,332 +21,492 @@
 import static com.google.common.base.Preconditions.checkNotNull;
 
 import java.util.ArrayList;
-import java.util.Collection;
-import java.util.Iterator;
+import java.util.Arrays;
+import java.util.Collections;
 import java.util.List;
-import java.util.Map;
 
-import javax.annotation.Nullable;
-
-import com.google.common.base.Preconditions;
-import com.google.common.collect.Ordering;
-import com.google.common.primitives.Ints;
+import com.google.common.collect.ObjectArrays;
 
 import io.netty.buffer.ArrowBuf;
-
+import org.apache.arrow.memory.BaseAllocator;
 import org.apache.arrow.memory.BufferAllocator;
 import org.apache.arrow.vector.*;
-import org.apache.arrow.vector.complex.impl.SingleMapReaderImpl;
-import org.apache.arrow.vector.complex.reader.FieldReader;
+import org.apache.arrow.vector.complex.impl.NullableMapReaderImpl;
+import org.apache.arrow.vector.complex.impl.NullableMapWriter;
 import org.apache.arrow.vector.holders.ComplexHolder;
-import org.apache.arrow.vector.types.Types.MinorType;
+import org.apache.arrow.vector.schema.ArrowFieldNode;
 import org.apache.arrow.vector.types.pojo.ArrowType;
-import org.apache.arrow.vector.types.pojo.Field;
+import org.apache.arrow.vector.types.pojo.ArrowType.Struct;
+import org.apache.arrow.vector.types.pojo.DictionaryEncoding;
 import org.apache.arrow.vector.types.pojo.FieldType;
+import org.apache.arrow.vector.types.pojo.Field;
 import org.apache.arrow.vector.util.CallBack;
-import org.apache.arrow.vector.util.JsonStringHashMap;
+import org.apache.arrow.vector.util.OversizedAllocationException;
 import org.apache.arrow.vector.util.TransferPair;
 
-public class MapVector extends AbstractMapVector {
-  //private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(MapVector.class);
+public class MapVector extends NonNullableMapVector implements FieldVector {
 
 Review comment:
   Is `NonNullableMapVector` something new? Can you explain the change of class 
hierarchy here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Decide what to do with non-nullable vectors in new vector class 
> hierarchy 
> -
>
> Key: ARROW-1710
> URL: https://issues.apache.org/jira/browse/ARROW-1710
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java - Vectors
>Reporter: Li Jin
>Assignee: Bryan Cutler
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> So far the consensus seems to be remove all non-nullable vectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261735#comment-16261735
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

wesm commented on a change in pull request #1259: ARROW-1047: [Java] Add 
Generic Reader Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#discussion_r152439165
 
 

 ##
 File path: java/tools/src/main/java/org/apache/arrow/tools/EchoServer.java
 ##
 @@ -23,8 +23,8 @@
 import org.apache.arrow.memory.BufferAllocator;
 import org.apache.arrow.memory.RootAllocator;
 import org.apache.arrow.vector.VectorSchemaRoot;
-import org.apache.arrow.vector.stream.ArrowStreamReader;
-import org.apache.arrow.vector.stream.ArrowStreamWriter;
+import org.apache.arrow.vector.ipc.stream.ArrowStreamReader;
+import org.apache.arrow.vector.ipc.stream.ArrowStreamWriter;
 
 Review comment:
   @icexelloss do you have an opinion on this? Would be good to get this patch 
in soon to facilitate testing


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1808) [C++] Make RecordBatch interface virtual to permit record batches that lazy-materialize columns

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261734#comment-16261734
 ] 

ASF GitHub Bot commented on ARROW-1808:
---

kou commented on issue #1337: ARROW-1808: [C++] Make RecordBatch, Table virtual 
interfaces for column access
URL: https://github.com/apache/arrow/pull/1337#issuecomment-346203089
 
 
   > It seems like respecting the schema is the right approach.
   
   I agree with you.
   
   > I could add boundschecking to `SimpleRecordBatch::column(i)` and return 
null if the index is out of bounds, would that help at all?
   
   I think that it's better that we do it in more higher layer such as GLib 
bindings layer. I think that we don't do needless checks in C++ layer for 
simplicity and performance.
   
   I'll add boundschecking in GLib bindings later. For now, I think that it's 
better that we always validate a newly created record batch. If we always 
validate it, we can assume that all record batches always have valid data.
   
   ```patch
   From d9260c09765b1cd337cda5a09497ee1b985ef623 Mon Sep 17 00:00:00 2001
   From: Kouhei Sutou 
   Date: Wed, 22 Nov 2017 09:09:30 +0900
   Subject: [PATCH] [GLib] Always validate on creating new record batch
   
   ---
c_glib/arrow-glib/record-batch.cpp | 13 ---
c_glib/arrow-glib/record-batch.h   |  3 ++-
c_glib/example/go/write-batch.go   | 10 +++--
c_glib/example/go/write-stream.go  | 10 +++--
c_glib/test/test-record-batch.rb   | 46 
+-
5 files changed, 58 insertions(+), 24 deletions(-)
   
   diff --git a/c_glib/arrow-glib/record-batch.cpp 
b/c_glib/arrow-glib/record-batch.cpp
   index f23a0cf7..73de6eeb 100644
   --- a/c_glib/arrow-glib/record-batch.cpp
   +++ b/c_glib/arrow-glib/record-batch.cpp
   @@ -135,13 +135,15 @@ garrow_record_batch_class_init(GArrowRecordBatchClass 
*klass)
 * @schema: The schema of the record batch.
 * @n_rows: The number of the rows in the record batch.
 * @columns: (element-type GArrowArray): The columns in the record batch.
   + * @error: (nullable): Return location for a #GError or %NULL.
 *
   - * Returns: A newly created #GArrowRecordBatch.
   + * Returns: (nullable): A newly created #GArrowRecordBatch or %NULL on 
error.
 */
GArrowRecordBatch *
garrow_record_batch_new(GArrowSchema *schema,
guint32 n_rows,
   -GList *columns)
   +GList *columns,
   +GError **error)
{
  std::vector arrow_columns;
  for (GList *node = columns; node; node = node->next) {
   @@ -152,7 +154,12 @@ garrow_record_batch_new(GArrowSchema *schema,
  auto arrow_record_batch =
arrow::RecordBatch::Make(garrow_schema_get_raw(schema),
 n_rows, arrow_columns);
   -  return garrow_record_batch_new_raw(_record_batch);
   +  auto status = arrow_record_batch->Validate();
   +  if (garrow_error_check(error, status, "[record-batch][new]")) {
   +return garrow_record_batch_new_raw(_record_batch);
   +  } else {
   +return NULL;
   +  }
}

/**
   diff --git a/c_glib/arrow-glib/record-batch.h 
b/c_glib/arrow-glib/record-batch.h
   index 021f894f..823a42bb 100644
   --- a/c_glib/arrow-glib/record-batch.h
   +++ b/c_glib/arrow-glib/record-batch.h
   @@ -68,7 +68,8 @@ GType garrow_record_batch_get_type(void) G_GNUC_CONST;

GArrowRecordBatch *garrow_record_batch_new(GArrowSchema *schema,
   guint32 n_rows,
   -   GList *columns);
   +   GList *columns,
   +   GError **error);

gboolean garrow_record_batch_equal(GArrowRecordBatch *record_batch,
   GArrowRecordBatch *other_record_batch);
   diff --git a/c_glib/example/go/write-batch.go 
b/c_glib/example/go/write-batch.go
   index 9dbc3c00..f4d03ed9 100644
   --- a/c_glib/example/go/write-batch.go
   +++ b/c_glib/example/go/write-batch.go
   @@ -188,7 +188,10 @@ func main() {
BuildDoubleArray(),
}

   -recordBatch := arrow.NewRecordBatch(schema, 4, columns)
   +recordBatch, err := arrow.NewRecordBatch(schema, 4, columns)
   +if err != nil {
   +log.Fatalf("Failed to create record batch #1: %v", err)
   +}
_, err = writer.WriteRecordBatch(recordBatch)
if err != nil {
log.Fatalf("Failed to write record batch #1: %v", err)
   @@ -198,7 +201,10 @@ func main() {
for i, column := range columns {
slicedColumns[i] = column.Slice(1, 3)
}
   -recordBatch = arrow.NewRecordBatch(schema, 3, slicedColumns)
   +recordBatch, err = arrow.NewRecordBatch(schema, 3, 

[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261733#comment-16261733
 ] 

ASF GitHub Bot commented on ARROW-1710:
---

icexelloss commented on issue #1341: [WIP] ARROW-1710: [Java] Remove 
Non-Nullable Vectors
URL: https://github.com/apache/arrow/pull/1341#issuecomment-346203057
 
 
   @BryanCutler Good work! High level looks good. I will try to look more 
closely tomorrow.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Decide what to do with non-nullable vectors in new vector class 
> hierarchy 
> -
>
> Key: ARROW-1710
> URL: https://issues.apache.org/jira/browse/ARROW-1710
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java - Vectors
>Reporter: Li Jin
>Assignee: Bryan Cutler
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> So far the consensus seems to be remove all non-nullable vectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1816) [Java] Resolve new vector classes structure for timestamp, date and maybe interval

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261726#comment-16261726
 ] 

ASF GitHub Bot commented on ARROW-1816:
---

icexelloss commented on a change in pull request #1330: wip: ARROW-1816: [Java] 
Resolve new vector classes structure for timestamp, date and maybe interval 
URL: https://github.com/apache/arrow/pull/1330#discussion_r152437974
 
 

 ##
 File path: java/vector/src/main/codegen/templates/MapWriters.java
 ##
 @@ -242,7 +242,7 @@ public void end() {
   <#assign constructorParams = minor.arrowTypeConstructorParams />
 <#else>
   <#assign constructorParams = [] />
-  <#list minor.typeParams?reverse as typeParam>
+  <#list minor.typeParams as typeParam>
 
 Review comment:
   Making param order more sane.. It's driving me nuts we are flipping the 
ordering twice.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Resolve new vector classes structure for timestamp, date and maybe 
> interval
> --
>
> Key: ARROW-1816
> URL: https://issues.apache.org/jira/browse/ARROW-1816
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Li Jin
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Personally I think having 8 vector classes for timestamps is not great. This 
> is discussed at some point during the PR:
> https://github.com/apache/arrow/pull/1203#discussion_r145241388



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1816) [Java] Resolve new vector classes structure for timestamp, date and maybe interval

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261723#comment-16261723
 ] 

ASF GitHub Bot commented on ARROW-1816:
---

icexelloss commented on a change in pull request #1330: wip: ARROW-1816: [Java] 
Resolve new vector classes structure for timestamp, date and maybe interval 
URL: https://github.com/apache/arrow/pull/1330#discussion_r152437699
 
 

 ##
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/types/TimeUnit.java
 ##
 @@ -19,10 +19,10 @@
 package org.apache.arrow.vector.types;
 
 public enum TimeUnit {
-  SECOND(org.apache.arrow.flatbuf.TimeUnit.SECOND),
-  MILLISECOND(org.apache.arrow.flatbuf.TimeUnit.MILLISECOND),
-  MICROSECOND(org.apache.arrow.flatbuf.TimeUnit.MICROSECOND),
-  NANOSECOND(org.apache.arrow.flatbuf.TimeUnit.NANOSECOND);
+  SECOND(org.apache.arrow.flatbuf.TimeUnit.SECOND, 
java.util.concurrent.TimeUnit.SECONDS),
+  MILLISECOND(org.apache.arrow.flatbuf.TimeUnit.MILLISECOND, 
java.util.concurrent.TimeUnit.MILLISECONDS),
+  MICROSECOND(org.apache.arrow.flatbuf.TimeUnit.MICROSECOND, 
java.util.concurrent.TimeUnit.MICROSECONDS),
+  NANOSECOND(org.apache.arrow.flatbuf.TimeUnit.NANOSECOND, 
java.util.concurrent.TimeUnit.NANOSECONDS);
 
 Review comment:
   Not sure why that's better...Why do you think so? Maybe I missed sth.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Resolve new vector classes structure for timestamp, date and maybe 
> interval
> --
>
> Key: ARROW-1816
> URL: https://issues.apache.org/jira/browse/ARROW-1816
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Li Jin
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Personally I think having 8 vector classes for timestamps is not great. This 
> is discussed at some point during the PR:
> https://github.com/apache/arrow/pull/1203#discussion_r145241388



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (ARROW-1341) [C++] Deprecate arrow::MakeTable in favor of new ctor from ARROW-1334

2017-11-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-1341.
-
Resolution: Fixed

Resolved in 
https://github.com/apache/arrow/commit/fc4e2c36d2c56a8bd5d1ab17eeb406826924d3e5#diff-5e6936b196075b1da885ea3592cb23bd

> [C++] Deprecate arrow::MakeTable in favor of new ctor from ARROW-1334
> -
>
> Key: ARROW-1341
> URL: https://issues.apache.org/jira/browse/ARROW-1341
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
> Fix For: 0.8.0
>
>
> Small oversight not doing this already in ARROW-1334



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1843) Merge tool occasionally leaves JIRAs in an invalid state

2017-11-21 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1843:
---

 Summary: Merge tool occasionally leaves JIRAs in an invalid state
 Key: ARROW-1843
 URL: https://issues.apache.org/jira/browse/ARROW-1843
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Wes McKinney


I have been noticing some patches are getting left in "In Progress" or "Open" 
state but in the web UI, the JIRA appears to be resolved. I have been having to 
reopen these issues, then press "Resolve" in the web UI. This isn't happening 
100% of the time, but has happened several times today



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Reopened] (ARROW-1808) [C++] Make RecordBatch interface virtual to permit record batches that lazy-materialize columns

2017-11-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reopened ARROW-1808:
-

> [C++] Make RecordBatch interface virtual to permit record batches that 
> lazy-materialize columns
> ---
>
> Key: ARROW-1808
> URL: https://issues.apache.org/jira/browse/ARROW-1808
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> This should be looked at soon to prevent having to define a different virtual 
> interface for record batches. There are places where we are using the record 
> batch constructor directly, and in some third party code (like MapD), so this 
> might be good to get done for 0.8.0



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1816) [Java] Resolve new vector classes structure for timestamp, date and maybe interval

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261714#comment-16261714
 ] 

ASF GitHub Bot commented on ARROW-1816:
---

icexelloss commented on a change in pull request #1330: wip: ARROW-1816: [Java] 
Resolve new vector classes structure for timestamp, date and maybe interval 
URL: https://github.com/apache/arrow/pull/1330#discussion_r152436633
 
 

 ##
 File path: java/vector/src/main/codegen/data/ValueVectorTypes.tdd
 ##
 @@ -116,7 +100,7 @@
 {
   class: "Decimal",
   maxPrecisionDigits: 38, nDecimalDigits: 4, friendlyType: 
"BigDecimal",
-  typeParams: [ {name: "scale", type: "int"}, { name: "precision", 
type: "int"}],
+  typeParams: [ { name: "precision", type: "int"}, {name: "scale", 
type: "int"} ],
 
 Review comment:
   The reverse ordering the these typeParams are driving me nuts... I changed 
it such that they follow the same order everywhere. Also see: 
https://github.com/apache/arrow/pull/1330/files#r152436539
   
   The end results are the same because flipped twice..


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Resolve new vector classes structure for timestamp, date and maybe 
> interval
> --
>
> Key: ARROW-1816
> URL: https://issues.apache.org/jira/browse/ARROW-1816
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Li Jin
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Personally I think having 8 vector classes for timestamps is not great. This 
> is discussed at some point during the PR:
> https://github.com/apache/arrow/pull/1203#discussion_r145241388



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (ARROW-1808) [C++] Make RecordBatch interface virtual to permit record batches that lazy-materialize columns

2017-11-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-1808.
-
Resolution: Fixed

> [C++] Make RecordBatch interface virtual to permit record batches that 
> lazy-materialize columns
> ---
>
> Key: ARROW-1808
> URL: https://issues.apache.org/jira/browse/ARROW-1808
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> This should be looked at soon to prevent having to define a different virtual 
> interface for record batches. There are places where we are using the record 
> batch constructor directly, and in some third party code (like MapD), so this 
> might be good to get done for 0.8.0



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1808) [C++] Make RecordBatch interface virtual to permit record batches that lazy-materialize columns

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261707#comment-16261707
 ] 

ASF GitHub Bot commented on ARROW-1808:
---

wesm commented on issue #1337: ARROW-1808: [C++] Make RecordBatch, Table 
virtual interfaces for column access
URL: https://github.com/apache/arrow/pull/1337#issuecomment-346200509
 
 
   Merging, since now the Linux build has failed only when reaching 
parquet-cpp. I will update the parquet-cpp patch and then merge that once its 
build passes


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Make RecordBatch interface virtual to permit record batches that 
> lazy-materialize columns
> ---
>
> Key: ARROW-1808
> URL: https://issues.apache.org/jira/browse/ARROW-1808
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> This should be looked at soon to prevent having to define a different virtual 
> interface for record batches. There are places where we are using the record 
> batch constructor directly, and in some third party code (like MapD), so this 
> might be good to get done for 0.8.0



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1816) [Java] Resolve new vector classes structure for timestamp, date and maybe interval

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261712#comment-16261712
 ] 

ASF GitHub Bot commented on ARROW-1816:
---

icexelloss commented on a change in pull request #1330: wip: ARROW-1816: [Java] 
Resolve new vector classes structure for timestamp, date and maybe interval 
URL: https://github.com/apache/arrow/pull/1330#discussion_r152436539
 
 

 ##
 File path: java/vector/src/main/codegen/templates/FixedValueVectors.java
 ##
 @@ -56,7 +56,7 @@
   private int allocationMonitor = 0;
   <#if minor.typeParams??>
 
-<#assign typeParams = minor.typeParams?reverse />
+<#assign typeParams = minor.typeParams />
 
 Review comment:
   Making type params order more sane...


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Resolve new vector classes structure for timestamp, date and maybe 
> interval
> --
>
> Key: ARROW-1816
> URL: https://issues.apache.org/jira/browse/ARROW-1816
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Li Jin
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Personally I think having 8 vector classes for timestamps is not great. This 
> is discussed at some point during the PR:
> https://github.com/apache/arrow/pull/1203#discussion_r145241388



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1808) [C++] Make RecordBatch interface virtual to permit record batches that lazy-materialize columns

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261710#comment-16261710
 ] 

ASF GitHub Bot commented on ARROW-1808:
---

wesm closed pull request #1337: ARROW-1808: [C++] Make RecordBatch, Table 
virtual interfaces for column access
URL: https://github.com/apache/arrow/pull/1337
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/.travis.yml b/.travis.yml
index 9c714a689..ddadf739a 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -55,6 +55,7 @@ matrix:
 - export ARROW_TRAVIS_VALGRIND=1
 - export ARROW_TRAVIS_PLASMA=1
 - export ARROW_TRAVIS_CLANG_FORMAT=1
+- export ARROW_BUILD_WARNING_LEVEL=CHECKIN
 - export CC="clang-4.0"
 - export CXX="clang++-4.0"
 - $TRAVIS_BUILD_DIR/ci/travis_install_clang_tools.sh
@@ -74,6 +75,7 @@ matrix:
 before_script:
 - export ARROW_TRAVIS_USE_TOOLCHAIN=1
 - export ARROW_TRAVIS_PLASMA=1
+- export ARROW_BUILD_WARNING_LEVEL=CHECKIN
 - travis_wait 50 $TRAVIS_BUILD_DIR/ci/travis_before_script_cpp.sh
 script:
 - $TRAVIS_BUILD_DIR/ci/travis_script_cpp.sh
diff --git a/c_glib/arrow-glib/record-batch.cpp 
b/c_glib/arrow-glib/record-batch.cpp
index f381af0a2..f23a0cf75 100644
--- a/c_glib/arrow-glib/record-batch.cpp
+++ b/c_glib/arrow-glib/record-batch.cpp
@@ -150,9 +150,8 @@ garrow_record_batch_new(GArrowSchema *schema,
   }
 
   auto arrow_record_batch =
-std::make_shared(garrow_schema_get_raw(schema),
- n_rows,
- arrow_columns);
+arrow::RecordBatch::Make(garrow_schema_get_raw(schema),
+ n_rows, arrow_columns);
   return garrow_record_batch_new_raw(_record_batch);
 }
 
diff --git a/c_glib/arrow-glib/table.cpp b/c_glib/arrow-glib/table.cpp
index 779f2ef62..e086396f8 100644
--- a/c_glib/arrow-glib/table.cpp
+++ b/c_glib/arrow-glib/table.cpp
@@ -143,8 +143,7 @@ garrow_table_new(GArrowSchema *schema,
   }
 
   auto arrow_table =
-std::make_shared(garrow_schema_get_raw(schema),
-   arrow_columns);
+arrow::Table::Make(garrow_schema_get_raw(schema), arrow_columns);
   return garrow_table_new_raw(_table);
 }
 
diff --git a/c_glib/test/test-file-writer.rb b/c_glib/test/test-file-writer.rb
index 3de8e5cf3..67aed85f7 100644
--- a/c_glib/test/test-file-writer.rb
+++ b/c_glib/test/test-file-writer.rb
@@ -19,14 +19,18 @@ class TestFileWriter < Test::Unit::TestCase
   include Helper::Buildable
 
   def test_write_record_batch
+data = [true]
+field = Arrow::Field.new("enabled", Arrow::BooleanDataType.new)
+schema = Arrow::Schema.new([field])
+
 tempfile = Tempfile.open("arrow-ipc-file-writer")
 output = Arrow::FileOutputStream.new(tempfile.path, false)
 begin
-  field = Arrow::Field.new("enabled", Arrow::BooleanDataType.new)
-  schema = Arrow::Schema.new([field])
   file_writer = Arrow::RecordBatchFileWriter.new(output, schema)
   begin
-record_batch = Arrow::RecordBatch.new(schema, 0, [])
+record_batch = Arrow::RecordBatch.new(schema,
+  data.size,
+  [build_boolean_array(data)])
 file_writer.write_record_batch(record_batch)
   ensure
 file_writer.close
@@ -38,8 +42,12 @@ def test_write_record_batch
 input = Arrow::MemoryMappedInputStream.new(tempfile.path)
 begin
   file_reader = Arrow::RecordBatchFileReader.new(input)
-  assert_equal(["enabled"],
+  assert_equal([field.name],
file_reader.schema.fields.collect(&:name))
+  assert_equal(Arrow::RecordBatch.new(schema,
+  data.size,
+  [build_boolean_array(data)]),
+   file_reader.read_record_batch(0))
 ensure
   input.close
 end
diff --git a/c_glib/test/test-gio-input-stream.rb 
b/c_glib/test/test-gio-input-stream.rb
index a71a37043..2adf25b3a 100644
--- a/c_glib/test/test-gio-input-stream.rb
+++ b/c_glib/test/test-gio-input-stream.rb
@@ -16,15 +16,21 @@
 # under the License.
 
 class TestGIOInputStream < Test::Unit::TestCase
+  include Helper::Buildable
+
   def test_reader_backend
+data = [true]
+field = Arrow::Field.new("enabled", Arrow::BooleanDataType.new)
+schema = Arrow::Schema.new([field])
+
 tempfile = Tempfile.open("arrow-gio-input-stream")
 output = Arrow::FileOutputStream.new(tempfile.path, false)
 begin
-  field = Arrow::Field.new("enabled", Arrow::BooleanDataType.new)
-  schema = Arrow::Schema.new([field])
   file_writer = 

[jira] [Resolved] (ARROW-1808) [C++] Make RecordBatch interface virtual to permit record batches that lazy-materialize columns

2017-11-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-1808.
-
Resolution: Fixed

Issue resolved by pull request 1337
[https://github.com/apache/arrow/pull/1337]

> [C++] Make RecordBatch interface virtual to permit record batches that 
> lazy-materialize columns
> ---
>
> Key: ARROW-1808
> URL: https://issues.apache.org/jira/browse/ARROW-1808
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> This should be looked at soon to prevent having to define a different virtual 
> interface for record batches. There are places where we are using the record 
> batch constructor directly, and in some third party code (like MapD), so this 
> might be good to get done for 0.8.0



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1816) [Java] Resolve new vector classes structure for timestamp, date and maybe interval

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261704#comment-16261704
 ] 

ASF GitHub Bot commented on ARROW-1816:
---

icexelloss commented on a change in pull request #1330: wip: ARROW-1816: [Java] 
Resolve new vector classes structure for timestamp, date and maybe interval 
URL: https://github.com/apache/arrow/pull/1330#discussion_r152436102
 
 

 ##
 File path: java/vector/src/main/codegen/data/ValueVectorTypes.tdd
 ##
 @@ -73,26 +73,10 @@
 { class: "UInt8" },
 { class: "Float8",   javaType: "double", boxedType: "Double", 
fields: [{name: "value", type: "double"}] },
 { class: "DateMilli",javaType: "long",  
friendlyType: "LocalDateTime" },
-{ class: "TimeStampSec", javaType: "long",   boxedType: "Long", 
friendlyType: "LocalDateTime" },
-{ class: "TimeStampMilli",   javaType: "long",   boxedType: "Long", 
friendlyType: "LocalDateTime" },
-{ class: "TimeStampMicro",   javaType: "long",   boxedType: "Long", 
friendlyType: "LocalDateTime" },
-{ class: "TimeStampNano",javaType: "long",   boxedType: "Long", 
friendlyType: "LocalDateTime" },
-{ class: "TimeStampSecTZ", javaType: "long",   boxedType: "Long",
- typeParams: [ {name: "timezone", type: 
"String"} ],
- arrowType: 
"org.apache.arrow.vector.types.pojo.ArrowType.Timestamp",
- arrowTypeConstructorParams: 
["org.apache.arrow.vector.types.TimeUnit.SECOND", "timezone"] },
-{ class: "TimeStampMilliTZ", javaType: "long",   boxedType: "Long",
- typeParams: [ {name: "timezone", type: 
"String"} ],
- arrowType: 
"org.apache.arrow.vector.types.pojo.ArrowType.Timestamp",
- arrowTypeConstructorParams: 
["org.apache.arrow.vector.types.TimeUnit.MILLISECOND", "timezone"] },
-{ class: "TimeStampMicroTZ", javaType: "long",   boxedType: "Long",
- typeParams: [ {name: "timezone", type: 
"String"} ],
- arrowType: 
"org.apache.arrow.vector.types.pojo.ArrowType.Timestamp",
- arrowTypeConstructorParams: 
["org.apache.arrow.vector.types.TimeUnit.MICROSECOND", "timezone"] },
-{ class: "TimeStampNanoTZ", javaType: "long",   boxedType: "Long",
- typeParams: [ {name: "timezone", type: 
"String"} ],
- arrowType: 
"org.apache.arrow.vector.types.pojo.ArrowType.Timestamp",
- arrowTypeConstructorParams: 
["org.apache.arrow.vector.types.TimeUnit.NANOSECOND", "timezone"] },
+{ class: "Timestamp",javaType: "long",   boxedType: "Long", 
friendlyType: "LocalDateTime"
 
 Review comment:
   It is used. The way it currently works is if timezone is specified, if will 
return the `LocalDateTime` in the specified time zone.
   
   For example:
   ```
   value = 0, timezone = null
   ```
   returns `LocalDateTime(1970-01-01 00:00:00)`
   
   ```
   value = 0, timezone = "America/New_York"
   ```
   returns `LocalDateTime(1969-12-31 19:00:00)`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Resolve new vector classes structure for timestamp, date and maybe 
> interval
> --
>
> Key: ARROW-1816
> URL: https://issues.apache.org/jira/browse/ARROW-1816
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Li Jin
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Personally I think having 8 vector classes for timestamps is not great. This 
> is discussed at some point during the PR:
> https://github.com/apache/arrow/pull/1203#discussion_r145241388



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1816) [Java] Resolve new vector classes structure for timestamp, date and maybe interval

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261703#comment-16261703
 ] 

ASF GitHub Bot commented on ARROW-1816:
---

icexelloss commented on a change in pull request #1330: wip: ARROW-1816: [Java] 
Resolve new vector classes structure for timestamp, date and maybe interval 
URL: https://github.com/apache/arrow/pull/1330#discussion_r152436102
 
 

 ##
 File path: java/vector/src/main/codegen/data/ValueVectorTypes.tdd
 ##
 @@ -73,26 +73,10 @@
 { class: "UInt8" },
 { class: "Float8",   javaType: "double", boxedType: "Double", 
fields: [{name: "value", type: "double"}] },
 { class: "DateMilli",javaType: "long",  
friendlyType: "LocalDateTime" },
-{ class: "TimeStampSec", javaType: "long",   boxedType: "Long", 
friendlyType: "LocalDateTime" },
-{ class: "TimeStampMilli",   javaType: "long",   boxedType: "Long", 
friendlyType: "LocalDateTime" },
-{ class: "TimeStampMicro",   javaType: "long",   boxedType: "Long", 
friendlyType: "LocalDateTime" },
-{ class: "TimeStampNano",javaType: "long",   boxedType: "Long", 
friendlyType: "LocalDateTime" },
-{ class: "TimeStampSecTZ", javaType: "long",   boxedType: "Long",
- typeParams: [ {name: "timezone", type: 
"String"} ],
- arrowType: 
"org.apache.arrow.vector.types.pojo.ArrowType.Timestamp",
- arrowTypeConstructorParams: 
["org.apache.arrow.vector.types.TimeUnit.SECOND", "timezone"] },
-{ class: "TimeStampMilliTZ", javaType: "long",   boxedType: "Long",
- typeParams: [ {name: "timezone", type: 
"String"} ],
- arrowType: 
"org.apache.arrow.vector.types.pojo.ArrowType.Timestamp",
- arrowTypeConstructorParams: 
["org.apache.arrow.vector.types.TimeUnit.MILLISECOND", "timezone"] },
-{ class: "TimeStampMicroTZ", javaType: "long",   boxedType: "Long",
- typeParams: [ {name: "timezone", type: 
"String"} ],
- arrowType: 
"org.apache.arrow.vector.types.pojo.ArrowType.Timestamp",
- arrowTypeConstructorParams: 
["org.apache.arrow.vector.types.TimeUnit.MICROSECOND", "timezone"] },
-{ class: "TimeStampNanoTZ", javaType: "long",   boxedType: "Long",
- typeParams: [ {name: "timezone", type: 
"String"} ],
- arrowType: 
"org.apache.arrow.vector.types.pojo.ArrowType.Timestamp",
- arrowTypeConstructorParams: 
["org.apache.arrow.vector.types.TimeUnit.NANOSECOND", "timezone"] },
+{ class: "Timestamp",javaType: "long",   boxedType: "Long", 
friendlyType: "LocalDateTime"
 
 Review comment:
   It is used. The way it currently works is if timezone is specified, if will 
just drop the timezone.
   
   For example:
   ```
   value = 0, timezone = null
   ```
   returns `LocalDateTime(1970-01-01 00:00:00)`
   
   ```
   value = 0, timezone = "America/New_York"
   ```
   returns `LocalDateTime(1969-12-31 19:00:00)`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Resolve new vector classes structure for timestamp, date and maybe 
> interval
> --
>
> Key: ARROW-1816
> URL: https://issues.apache.org/jira/browse/ARROW-1816
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Li Jin
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Personally I think having 8 vector classes for timestamps is not great. This 
> is discussed at some point during the PR:
> https://github.com/apache/arrow/pull/1203#discussion_r145241388



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Reopened] (ARROW-1826) [JAVA] Avoid branching at cell level (copyFrom)

2017-11-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reopened ARROW-1826:
-

> [JAVA] Avoid branching at cell level (copyFrom)
> ---
>
> Key: ARROW-1826
> URL: https://issues.apache.org/jira/browse/ARROW-1826
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Siddharth Teotia
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (ARROW-1826) [JAVA] Avoid branching at cell level (copyFrom)

2017-11-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-1826.
-
Resolution: Fixed

> [JAVA] Avoid branching at cell level (copyFrom)
> ---
>
> Key: ARROW-1826
> URL: https://issues.apache.org/jira/browse/ARROW-1826
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Siddharth Teotia
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Reopened] (ARROW-1559) [C++] Kernel implementations for "unique" (compute distinct elements of array)

2017-11-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reopened ARROW-1559:
-
Assignee: Wes McKinney  (was: Uwe L. Korn)

> [C++] Kernel implementations for "unique" (compute distinct elements of array)
> --
>
> Key: ARROW-1559
> URL: https://issues.apache.org/jira/browse/ARROW-1559
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: Analytics, pull-request-available
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Reopened] (ARROW-1838) [C++] Use compute::Datum uniformly for input argument to kernels

2017-11-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reopened ARROW-1838:
-

> [C++] Use compute::Datum uniformly for input argument to kernels
> 
>
> Key: ARROW-1838
> URL: https://issues.apache.org/jira/browse/ARROW-1838
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> This is some API tidying after ARROW-1559. Some kernel APIs are still using 
> {{ArrayData}} for the input argument



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (ARROW-1559) [C++] Kernel implementations for "unique" (compute distinct elements of array)

2017-11-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-1559.
-
Resolution: Fixed

> [C++] Kernel implementations for "unique" (compute distinct elements of array)
> --
>
> Key: ARROW-1559
> URL: https://issues.apache.org/jira/browse/ARROW-1559
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: Analytics, pull-request-available
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (ARROW-1838) [C++] Use compute::Datum uniformly for input argument to kernels

2017-11-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-1838.
-
Resolution: Fixed

> [C++] Use compute::Datum uniformly for input argument to kernels
> 
>
> Key: ARROW-1838
> URL: https://issues.apache.org/jira/browse/ARROW-1838
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> This is some API tidying after ARROW-1559. Some kernel APIs are still using 
> {{ArrayData}} for the input argument



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1816) [Java] Resolve new vector classes structure for timestamp, date and maybe interval

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261696#comment-16261696
 ] 

ASF GitHub Bot commented on ARROW-1816:
---

icexelloss commented on a change in pull request #1330: wip: ARROW-1816: [Java] 
Resolve new vector classes structure for timestamp, date and maybe interval 
URL: https://github.com/apache/arrow/pull/1330#discussion_r152435367
 
 

 ##
 File path: java/vector/pom.xml
 ##
 @@ -135,6 +135,13 @@
 org.apache.drill.tools
 drill-fmpp-maven-plugin
 1.5.0
+
+  
+org.freemarker
+freemarker
+2.3.23
+  
+
 
 Review comment:
   This should just be build dependency. Before it was using 2.3.21.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Resolve new vector classes structure for timestamp, date and maybe 
> interval
> --
>
> Key: ARROW-1816
> URL: https://issues.apache.org/jira/browse/ARROW-1816
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Li Jin
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Personally I think having 8 vector classes for timestamps is not great. This 
> is discussed at some point during the PR:
> https://github.com/apache/arrow/pull/1203#discussion_r145241388



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1816) [Java] Resolve new vector classes structure for timestamp, date and maybe interval

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261694#comment-16261694
 ] 

ASF GitHub Bot commented on ARROW-1816:
---

icexelloss commented on issue #1330: wip: ARROW-1816: [Java] Resolve new vector 
classes structure for timestamp, date and maybe interval
URL: https://github.com/apache/arrow/pull/1330#issuecomment-346199159
 
 
   @BryanCutler Thanks for the code review. I will clean this up.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Resolve new vector classes structure for timestamp, date and maybe 
> interval
> --
>
> Key: ARROW-1816
> URL: https://issues.apache.org/jira/browse/ARROW-1816
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Li Jin
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Personally I think having 8 vector classes for timestamps is not great. This 
> is discussed at some point during the PR:
> https://github.com/apache/arrow/pull/1203#discussion_r145241388



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1816) [Java] Resolve new vector classes structure for timestamp, date and maybe interval

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261692#comment-16261692
 ] 

ASF GitHub Bot commented on ARROW-1816:
---

icexelloss commented on issue #1330: wip: ARROW-1816: [Java] Resolve new vector 
classes structure for timestamp, date and maybe interval
URL: https://github.com/apache/arrow/pull/1330#issuecomment-346198938
 
 
   @jacques-n During the refactoring, I discovered that union vector doesn't 
support types with type params (i.e. Decimal and Timestamp), so I fixed that as 
well. Now union vectors support decimal as well.
   
   The PR now includes two major changes:
   * Consolidate all timestamp vectors into a single vector class
   * Add support for non-simple minor type (i.e., decimal, timestamp) in Union
   
   Most of the template change is to support types with type params, most of 
them replaces:
   ```
   <#if !minor.typeParams?? >
   // handle types with no type params
   
   ```
   to
   ```
   <#if !minor.typeParams?? >
   // handle types with no type params
   <#else>
   // handle types with type params
   
   ```
   Please take a look when you have chance. Thanks! Hopefully we can clean this 
up and merge the week after Thanksgiving.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Resolve new vector classes structure for timestamp, date and maybe 
> interval
> --
>
> Key: ARROW-1816
> URL: https://issues.apache.org/jira/browse/ARROW-1816
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Li Jin
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Personally I think having 8 vector classes for timestamps is not great. This 
> is discussed at some point during the PR:
> https://github.com/apache/arrow/pull/1203#discussion_r145241388



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1782) [Python] Expose compressors as pyarrow.compress, pyarrow.decompress

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261685#comment-16261685
 ] 

ASF GitHub Bot commented on ARROW-1782:
---

wesm commented on a change in pull request #1345: ARROW-1782: [Python] Add 
pyarrow.compress, decompress APIs
URL: https://github.com/apache/arrow/pull/1345#discussion_r152433696
 
 

 ##
 File path: cpp/src/arrow/util/compression.h
 ##
 @@ -27,7 +27,7 @@
 namespace arrow {
 
 struct Compression {
-  enum type { UNCOMPRESSED, SNAPPY, GZIP, LZO, BROTLI, ZSTD, LZ4 };
+  enum type { UNCOMPRESSED, SNAPPY, GZIP, BROTLI, ZSTD, LZ4, LZO };
 
 Review comment:
   We should probably remove LZO, but I think this is exposed in parquet-cpp 
right now


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Expose compressors as pyarrow.compress, pyarrow.decompress
> ---
>
> Key: ARROW-1782
> URL: https://issues.apache.org/jira/browse/ARROW-1782
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> These should release the GIL, and serve as an alternative to the various 
> compressor wrapper libraries out there. They should have the ability to work 
> with {{pyarrow.Buffer}} or {{PyBytes}} as the user prefers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1782) [Python] Expose compressors as pyarrow.compress, pyarrow.decompress

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261684#comment-16261684
 ] 

ASF GitHub Bot commented on ARROW-1782:
---

wesm opened a new pull request #1345: ARROW-1782: [Python] Add 
pyarrow.compress, decompress APIs
URL: https://github.com/apache/arrow/pull/1345
 
 
   This enables bytes, Buffer, or buffer-like objects to be compressed either 
to PyBytes or `pyarrow.Buffer`. Wanted some feedback on the API (argument 
names, etc.). The compression API in Arrow in general requires knowing the size 
of the decompressed data, but some compressors (like Snappy) are able to tell 
you how big the result will be based only on the input buffer


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Expose compressors as pyarrow.compress, pyarrow.decompress
> ---
>
> Key: ARROW-1782
> URL: https://issues.apache.org/jira/browse/ARROW-1782
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> These should release the GIL, and serve as an alternative to the various 
> compressor wrapper libraries out there. They should have the ability to work 
> with {{pyarrow.Buffer}} or {{PyBytes}} as the user prefers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1782) [Python] Expose compressors as pyarrow.compress, pyarrow.decompress

2017-11-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-1782:
--
Labels: pull-request-available  (was: )

> [Python] Expose compressors as pyarrow.compress, pyarrow.decompress
> ---
>
> Key: ARROW-1782
> URL: https://issues.apache.org/jira/browse/ARROW-1782
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> These should release the GIL, and serve as an alternative to the various 
> compressor wrapper libraries out there. They should have the ability to work 
> with {{pyarrow.Buffer}} or {{PyBytes}} as the user prefers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1816) [Java] Resolve new vector classes structure for timestamp, date and maybe interval

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261680#comment-16261680
 ] 

ASF GitHub Bot commented on ARROW-1816:
---

icexelloss commented on a change in pull request #1330: wip: ARROW-1816: [Java] 
Resolve new vector classes structure for timestamp, date and maybe interval 
URL: https://github.com/apache/arrow/pull/1330#discussion_r152433253
 
 

 ##
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/NullableTimestampVector.java
 ##
 @@ -18,30 +18,83 @@
 
 package org.apache.arrow.vector;
 
+import com.google.common.base.Preconditions;
+import org.apache.arrow.vector.types.TimeUnit;
+import org.joda.time.DateTimeZone;
+
 import io.netty.buffer.ArrowBuf;
 import org.apache.arrow.memory.BufferAllocator;
+import org.apache.arrow.vector.complex.impl.TimestampReaderImpl;
+import org.apache.arrow.vector.complex.reader.FieldReader;
+import org.apache.arrow.vector.holders.NullableTimestampHolder;
+import org.apache.arrow.vector.holders.TimestampHolder;
+import org.apache.arrow.vector.types.Types;
+import org.apache.arrow.vector.types.pojo.ArrowType;
 import org.apache.arrow.vector.types.pojo.FieldType;
 import org.apache.arrow.vector.util.TransferPair;
+import org.joda.time.LocalDateTime;
 
 Review comment:
   We really need that checkstyle :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Resolve new vector classes structure for timestamp, date and maybe 
> interval
> --
>
> Key: ARROW-1816
> URL: https://issues.apache.org/jira/browse/ARROW-1816
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Li Jin
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Personally I think having 8 vector classes for timestamps is not great. This 
> is discussed at some point during the PR:
> https://github.com/apache/arrow/pull/1203#discussion_r145241388



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1268) [Website] Blog post on Arrow integration with Spark

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261673#comment-16261673
 ] 

ASF GitHub Bot commented on ARROW-1268:
---

wesm closed pull request #1344: ARROW-1268: [SITE][FOLLOWUP] Update Spark Post 
to Reflect Conf Change
URL: https://github.com/apache/arrow/pull/1344
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/site/_posts/2017-07-26-spark-arrow.md 
b/site/_posts/2017-07-26-spark-arrow.md
index c4b16c073..211e5a481 100644
--- a/site/_posts/2017-07-26-spark-arrow.md
+++ b/site/_posts/2017-07-26-spark-arrow.md
@@ -57,7 +57,7 @@ the conversion to Arrow data can be done on the JVM and 
pushed back for the Spar
 executors to perform in parallel, drastically reducing the load on the driver.
 
 As of the merging of [SPARK-13534][5], the use of Arrow when calling 
`toPandas()`
-needs to be enabled by setting the SQLConf "spark.sql.execution.arrow.enable" 
to
+needs to be enabled by setting the SQLConf "spark.sql.execution.arrow.enabled" 
to
 "true".  Let's look at a simple usage example.
 
 ```
@@ -84,7 +84,7 @@ In [2]: %time pdf = df.toPandas()
 CPU times: user 17.4 s, sys: 792 ms, total: 18.1 s
 Wall time: 20.7 s
 
-In [3]: spark.conf.set("spark.sql.execution.arrow.enable", "true")
+In [3]: spark.conf.set("spark.sql.execution.arrow.enabled", "true")
 
 In [4]: %time pdf = df.toPandas()
 CPU times: user 40 ms, sys: 32 ms, total: 72 ms
 
@@ -118,7 +118,7 @@ It is planned to add pyarrow as a pyspark dependency so that
 
 Currently, the controlling SQLConf is disabled by default. This can be enabled
 programmatically as in the example above or by adding the line
-"spark.sql.execution.arrow.enable=true" to 
`SPARK_HOME/conf/spark-defaults.conf`.
+"spark.sql.execution.arrow.enabled=true" to 
`SPARK_HOME/conf/spark-defaults.conf`.
 
 Also, not all Spark data types are currently supported and limited to primitive
 types. Expanded type support is in the works and expected to also be in the 
Spark


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Website] Blog post on Arrow integration with Spark
> ---
>
> Key: ARROW-1268
> URL: https://issues.apache.org/jira/browse/ARROW-1268
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Website
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1268) [Website] Blog post on Arrow integration with Spark

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261587#comment-16261587
 ] 

ASF GitHub Bot commented on ARROW-1268:
---

BryanCutler commented on issue #1344: ARROW-1268: [SITE][FOLLOWUP] Update Spark 
Post to Reflect Conf Change
URL: https://github.com/apache/arrow/pull/1344#issuecomment-346180824
 
 
   @wesm the Spark conf has changed since publishing, so I figured the post 
should be updated to reflect that


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Website] Blog post on Arrow integration with Spark
> ---
>
> Key: ARROW-1268
> URL: https://issues.apache.org/jira/browse/ARROW-1268
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Website
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1268) [Website] Blog post on Arrow integration with Spark

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261585#comment-16261585
 ] 

ASF GitHub Bot commented on ARROW-1268:
---

BryanCutler opened a new pull request #1344: ARROW-1268: [SITE][FOLLOWUP] 
Update Spark Post to Reflect Conf Change
URL: https://github.com/apache/arrow/pull/1344
 
 
   The Spark conf to enable arrow has changed from 
"spark.sql.execution.arrow.enable" to "spark.sql.execution.arrow.enabled"


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Website] Blog post on Arrow integration with Spark
> ---
>
> Key: ARROW-1268
> URL: https://issues.apache.org/jira/browse/ARROW-1268
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Website
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1268) [Website] Blog post on Arrow integration with Spark

2017-11-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-1268:
--
Labels: pull-request-available  (was: )

> [Website] Blog post on Arrow integration with Spark
> ---
>
> Key: ARROW-1268
> URL: https://issues.apache.org/jira/browse/ARROW-1268
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Website
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261521#comment-16261521
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

BryanCutler commented on a change in pull request #1259: ARROW-1047: [Java] Add 
Generic Reader Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#discussion_r152356580
 
 

 ##
 File path: java/tools/src/main/java/org/apache/arrow/tools/EchoServer.java
 ##
 @@ -23,8 +23,8 @@
 import org.apache.arrow.memory.BufferAllocator;
 import org.apache.arrow.memory.RootAllocator;
 import org.apache.arrow.vector.VectorSchemaRoot;
-import org.apache.arrow.vector.stream.ArrowStreamReader;
-import org.apache.arrow.vector.stream.ArrowStreamWriter;
+import org.apache.arrow.vector.ipc.stream.ArrowStreamReader;
+import org.apache.arrow.vector.ipc.stream.ArrowStreamWriter;
 
 Review comment:
   Do you think the same for file and json readers, e.g. `ipc.ArrowFileReader`? 
 I made these subpackages because there were some supporting files specific to 
just the file reader, so they could be grouped together.  But I'm ok either 
way, @icexelloss brought this up here 
https://github.com/apache/arrow/pull/1259#issuecomment-340562836


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261508#comment-16261508
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

wesm commented on a change in pull request #1259: ARROW-1047: [Java] Add 
Generic Reader Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#discussion_r152406142
 
 

 ##
 File path: java/tools/src/main/java/org/apache/arrow/tools/EchoServer.java
 ##
 @@ -23,8 +23,8 @@
 import org.apache.arrow.memory.BufferAllocator;
 import org.apache.arrow.memory.RootAllocator;
 import org.apache.arrow.vector.VectorSchemaRoot;
-import org.apache.arrow.vector.stream.ArrowStreamReader;
-import org.apache.arrow.vector.stream.ArrowStreamWriter;
+import org.apache.arrow.vector.ipc.stream.ArrowStreamReader;
+import org.apache.arrow.vector.ipc.stream.ArrowStreamWriter;
 
 Review comment:
   These classes are all quite similar (the file format is very nearly the 
stream format, plus a file footer and magic numbers at start and end), I think 
it would make sense to keep them in a flat package namespace (but I'm not a 
Java expert)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1703) [C++] Vendor exact version of jemalloc we depend on

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261506#comment-16261506
 ] 

ASF GitHub Bot commented on ARROW-1703:
---

wesm commented on issue #1334: ARROW-1703: [C++] Vendor exact version of 
jemalloc we depend on
URL: https://github.com/apache/arrow/pull/1334#issuecomment-346165933
 
 
   N.B. The problem with this vendoring strategy is that updates to jemalloc 
will bloat the repo size, where the vendoring procedure used in Redis and other 
projects would not. So if we end up needing to update jemalloc more than once 
in the future, we might check in the raw source directory so that incremental 
diffs don't cause a significant increase in repo size


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Vendor exact version of jemalloc we depend on
> ---
>
> Key: ARROW-1703
> URL: https://issues.apache.org/jira/browse/ARROW-1703
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Uwe L. Korn
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Since we are likely going to be using a patched jemalloc, we probably should 
> not support using jemalloc with any other version, or relying on system 
> packages. jemalloc would therefore always be built together with Arrow if 
> {{ARROW_JEMALLOC}} is on
> For this reason I believe we should vendor the code at the pinned commit as 
> with Redis and other projects: 
> https://github.com/antirez/redis/tree/unstable/deps



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (ARROW-1703) [C++] Vendor exact version of jemalloc we depend on

2017-11-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-1703.
-
Resolution: Fixed

Issue resolved by pull request 1334
[https://github.com/apache/arrow/pull/1334]

> [C++] Vendor exact version of jemalloc we depend on
> ---
>
> Key: ARROW-1703
> URL: https://issues.apache.org/jira/browse/ARROW-1703
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Uwe L. Korn
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Since we are likely going to be using a patched jemalloc, we probably should 
> not support using jemalloc with any other version, or relying on system 
> packages. jemalloc would therefore always be built together with Arrow if 
> {{ARROW_JEMALLOC}} is on
> For this reason I believe we should vendor the code at the pinned commit as 
> with Redis and other projects: 
> https://github.com/antirez/redis/tree/unstable/deps



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1703) [C++] Vendor exact version of jemalloc we depend on

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261501#comment-16261501
 ] 

ASF GitHub Bot commented on ARROW-1703:
---

wesm closed pull request #1334: ARROW-1703: [C++] Vendor exact version of 
jemalloc we depend on
URL: https://github.com/apache/arrow/pull/1334
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/LICENSE.txt b/LICENSE.txt
index 84e6a4e2a..30966d36f 100644
--- a/LICENSE.txt
+++ b/LICENSE.txt
@@ -552,3 +552,34 @@ distributed under the License is distributed on an "AS IS" 
BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
+
+
+
+This project includes code from the jemalloc project
+
+https://github.com/jemalloc/jemalloc
+
+Copyright (C) 2002-2017 Jason Evans .
+All rights reserved.
+Copyright (C) 2007-2012 Mozilla Foundation.  All rights reserved.
+Copyright (C) 2009-2017 Facebook, Inc.  All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+1. Redistributions of source code must retain the above copyright notice(s),
+   this list of conditions and the following disclaimer.
+2. Redistributions in binary form must reproduce the above copyright notice(s),
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER(S) ``AS IS'' AND ANY EXPRESS
+OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO
+EVENT SHALL THE COPYRIGHT HOLDER(S) BE LIABLE FOR ANY DIRECT, INDIRECT,
+INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
+OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
+ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
diff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake 
b/cpp/cmake_modules/ThirdpartyToolchain.cmake
index 42d7eddc9..411cf7584 100644
--- a/cpp/cmake_modules/ThirdpartyToolchain.cmake
+++ b/cpp/cmake_modules/ThirdpartyToolchain.cmake
@@ -24,7 +24,7 @@ set(GFLAGS_VERSION "2.2.0")
 set(GTEST_VERSION "1.8.0")
 set(GBENCHMARK_VERSION "1.1.0")
 set(FLATBUFFERS_VERSION "1.7.1")
-set(JEMALLOC_VERSION "4.4.0")
+set(JEMALLOC_VERSION "17c897976c60b0e6e4f4a365c751027244dada7a")
 set(SNAPPY_VERSION "1.1.3")
 set(BROTLI_VERSION "v0.6.0")
 set(LZ4_VERSION "1.7.5")
@@ -471,8 +471,8 @@ if (ARROW_JEMALLOC)
 set(JEMALLOC_STATIC_LIB 
"${JEMALLOC_PREFIX}/lib/libjemalloc_pic${CMAKE_STATIC_LIBRARY_SUFFIX}")
 set(JEMALLOC_VENDORED 1)
 ExternalProject_Add(jemalloc_ep
-  URL 
https://github.com/jemalloc/jemalloc/releases/download/${JEMALLOC_VERSION}/jemalloc-${JEMALLOC_VERSION}.tar.bz2
-  CONFIGURE_COMMAND ./configure "--prefix=${JEMALLOC_PREFIX}" 
"--with-jemalloc-prefix=je_arrow_" "--with-private-namespace=je_arrow_private_"
+  URL 
${CMAKE_CURRENT_SOURCE_DIR}/thirdparty/jemalloc/${JEMALLOC_VERSION}.tar.gz
+  CONFIGURE_COMMAND ./autogen.sh "--prefix=${JEMALLOC_PREFIX}" 
"--with-jemalloc-prefix=je_arrow_" "--with-private-namespace=je_arrow_private_" 
&& touch doc/jemalloc.html && touch doc/jemalloc.3
   ${EP_LOG_OPTIONS}
   BUILD_IN_SOURCE 1
   BUILD_COMMAND ${MAKE}
diff --git 
a/cpp/thirdparty/jemalloc/17c897976c60b0e6e4f4a365c751027244dada7a.tar.gz 
b/cpp/thirdparty/jemalloc/17c897976c60b0e6e4f4a365c751027244dada7a.tar.gz
new file mode 100644
index 0..29d9266a1
Binary files /dev/null and 
b/cpp/thirdparty/jemalloc/17c897976c60b0e6e4f4a365c751027244dada7a.tar.gz differ
diff --git a/cpp/thirdparty/jemalloc/README.md 
b/cpp/thirdparty/jemalloc/README.md
new file mode 100644
index 0..272ff9c73
--- /dev/null
+++ b/cpp/thirdparty/jemalloc/README.md
@@ -0,0 +1,22 @@
+
+
+This directory contains a vendored commit from the jemalloc stable-4 branch.
+You can bump the version by downloading
+https://github.com/jemalloc/jemalloc/archive/{{ commit }}.tar.gz


 


This is an automated message from the Apache Git Service.

[jira] [Commented] (ARROW-1703) [C++] Vendor exact version of jemalloc we depend on

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261500#comment-16261500
 ] 

ASF GitHub Bot commented on ARROW-1703:
---

wesm commented on issue #1334: ARROW-1703: [C++] Vendor exact version of 
jemalloc we depend on
URL: https://github.com/apache/arrow/pull/1334#issuecomment-346165152
 
 
   Perhaps we can do it in one of the Linux builds for now. I'll merge this and 
we can deal with the reintroduction as the default in a separate patch


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Vendor exact version of jemalloc we depend on
> ---
>
> Key: ARROW-1703
> URL: https://issues.apache.org/jira/browse/ARROW-1703
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Uwe L. Korn
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Since we are likely going to be using a patched jemalloc, we probably should 
> not support using jemalloc with any other version, or relying on system 
> packages. jemalloc would therefore always be built together with Arrow if 
> {{ARROW_JEMALLOC}} is on
> For this reason I believe we should vendor the code at the pinned commit as 
> with Redis and other projects: 
> https://github.com/antirez/redis/tree/unstable/deps



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1842) ParquetDataset.read(): selectively reading array column

2017-11-21 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261496#comment-16261496
 ] 

Wes McKinney commented on ARROW-1842:
-

I think this is a duplicate of 
https://issues.apache.org/jira/browse/ARROW-1684. I think if you specify 
{{'c.element'}} it will read the column of interest, but please confirm

> ParquetDataset.read(): selectively reading array column
> ---
>
> Key: ARROW-1842
> URL: https://issues.apache.org/jira/browse/ARROW-1842
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.7.1
>Reporter: Young-Jun Ko
>
> Scenario:
> - created a dataframe in spark and saved it as parquet
> - columns include simple types, e.g. String, but also an array of doubles
> Issue:
> I can read the whole data using ParquetDataset in pyarrow.
> I tried reading selectively a simple type => works
> I tried reading selectively the array column => key error in the following 
> place:
> KeyError: 'c'
> /home/hadoop/Python/lib/python2.7/site-packages/pyarrow/_parquet.pyx in 
> pyarrow._parquet.ParquetReader.column_name_idx 
> (/arrow/python/build/temp.linux-x86_64-2.7/_parquet.cxx:9777)()
> 513 self.column_idx_map[col_bytes] = i
> 514 
> --> 515 return self.column_idx_map[tobytes(column_name)]
> When I just read the whole dataset, I get the correct metadata
> pyarrow.Table
> a: string
> b: string
> c: list
>   child 0, element: double
> d: int64
> metadata
> 
> {'org.apache.spark.sql.parquet.row.metadata': 
> '{"type":"struct","fields":[{"name":"a","type":"string","nullable":true,"metadata":{}},{"name":"b","type":"string","nullable":true,"metadata":{}},{"name":"c","type":{"type":"array","elementType":"double","containsNull":false},"nullable":true,"metadata":{}},{"name":"d","type":"long","nullable":false,"metadata":{}}]}'}
> I might just be missing the correct naming convention of the array column.
> But then this name should be reflected in the metadata.
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1836) [C++] Fix C4996 warning from arrow/util/variant.h on MSVC builds

2017-11-21 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261493#comment-16261493
 ] 

Wes McKinney commented on ARROW-1836:
-

We can remove the static visitor but the only problem is that if we do updates 
of this code to the latest from mapbox/variant that we'll have to re-apply the 
edits. We should probably raise the issue in https://github.com/mapbox/variant 
to see if it can be fixed at the source

> [C++] Fix C4996 warning from arrow/util/variant.h on MSVC builds
> 
>
> Key: ARROW-1836
> URL: https://issues.apache.org/jira/browse/ARROW-1836
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Max Risuhin
> Fix For: 0.8.0
>
>
> [~Max Risuhin] can you look into this? This is leaking into downstream users 
> of Arrow. see e.g. 
> https://github.com/apache/parquet-cpp/pull/403/commits/8e40b7d7d8f161a14dfed70cb6d528e82ffa21a9
>  and build failures 
> https://ci.appveyor.com/project/ApacheSoftwareFoundation/parquet-cpp/build/1.0.443



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1832) [JS] Implement JSON reader for integration tests

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261470#comment-16261470
 ] 

ASF GitHub Bot commented on ARROW-1832:
---

trxcllnt commented on issue #1343: [WIP] ARROW-1832: [JS] Implement JSON reader 
for integration tests
URL: https://github.com/apache/arrow/pull/1343#issuecomment-346124815
 
 
   @theneuralbit awesome! Will look closer when I’m on my laptop. To fix the 
closure build issue, just update [lines 55 and 
58](https://github.com/apache/arrow/blob/master/js/gulp/typescript-task.js#L55) 
to the new generated JS files glob path.
   
   It’s hacky, but cc with advanced opts can’t track the property names through 
the nested function declaration IEFE’s that TS compiles namespaces to while 
mangling, so we use the flatc generated JS files here instead.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Implement JSON reader for integration tests
> 
>
> Key: ARROW-1832
> URL: https://issues.apache.org/jira/browse/ARROW-1832
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
>
> Implementing a JSON reader will allow us to write a "validate" script for the 
> consumer half of the integration tests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261330#comment-16261330
 ] 

ASF GitHub Bot commented on ARROW-1710:
---

icexelloss commented on issue #1341: [WIP] ARROW-1710: [Java] Remove 
Non-Nullable Vectors
URL: https://github.com/apache/arrow/pull/1341#issuecomment-346132977
 
 
   If this is merged first. I would just delete different files, shouldn't be 
hard. If #1330 get merged first, you need to rename NullableTimestampVector -> 
TimestampVector.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Decide what to do with non-nullable vectors in new vector class 
> hierarchy 
> -
>
> Key: ARROW-1710
> URL: https://issues.apache.org/jira/browse/ARROW-1710
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java - Vectors
>Reporter: Li Jin
>Assignee: Bryan Cutler
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> So far the consensus seems to be remove all non-nullable vectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261324#comment-16261324
 ] 

ASF GitHub Bot commented on ARROW-1710:
---

icexelloss commented on issue #1341: [WIP] ARROW-1710: [Java] Remove 
Non-Nullable Vectors
URL: https://github.com/apache/arrow/pull/1341#issuecomment-346132669
 
 
   @BryanCutler There might be some conflicts but I think we should be fine.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Decide what to do with non-nullable vectors in new vector class 
> hierarchy 
> -
>
> Key: ARROW-1710
> URL: https://issues.apache.org/jira/browse/ARROW-1710
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java - Vectors
>Reporter: Li Jin
>Assignee: Bryan Cutler
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> So far the consensus seems to be remove all non-nullable vectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261308#comment-16261308
 ] 

ASF GitHub Bot commented on ARROW-1710:
---

BryanCutler commented on issue #1341: [WIP] ARROW-1710: [Java] Remove 
Non-Nullable Vectors
URL: https://github.com/apache/arrow/pull/1341#issuecomment-346128441
 
 
   @icexelloss I didn't see you were already working on #1330, which is also a 
sizable change, before I started on this.  I'm not sure which one would be 
easier to merge first, any thoughts?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Decide what to do with non-nullable vectors in new vector class 
> hierarchy 
> -
>
> Key: ARROW-1710
> URL: https://issues.apache.org/jira/browse/ARROW-1710
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java - Vectors
>Reporter: Li Jin
>Assignee: Bryan Cutler
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> So far the consensus seems to be remove all non-nullable vectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1816) [Java] Resolve new vector classes structure for timestamp, date and maybe interval

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261297#comment-16261297
 ] 

ASF GitHub Bot commented on ARROW-1816:
---

BryanCutler commented on a change in pull request #1330: wip: ARROW-1816: 
[Java] Resolve new vector classes structure for timestamp, date and maybe 
interval
URL: https://github.com/apache/arrow/pull/1330#discussion_r152366389
 
 

 ##
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/NullableTimestampVector.java
 ##
 @@ -18,30 +18,83 @@
 
 package org.apache.arrow.vector;
 
+import com.google.common.base.Preconditions;
+import org.apache.arrow.vector.types.TimeUnit;
+import org.joda.time.DateTimeZone;
+
 import io.netty.buffer.ArrowBuf;
 import org.apache.arrow.memory.BufferAllocator;
+import org.apache.arrow.vector.complex.impl.TimestampReaderImpl;
+import org.apache.arrow.vector.complex.reader.FieldReader;
+import org.apache.arrow.vector.holders.NullableTimestampHolder;
+import org.apache.arrow.vector.holders.TimestampHolder;
+import org.apache.arrow.vector.types.Types;
+import org.apache.arrow.vector.types.pojo.ArrowType;
 import org.apache.arrow.vector.types.pojo.FieldType;
 import org.apache.arrow.vector.util.TransferPair;
+import org.joda.time.LocalDateTime;
 
 Review comment:
   minor: maybe move this import up with DateTimeZone, and move the TimeUnit 
down with other arrow


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Resolve new vector classes structure for timestamp, date and maybe 
> interval
> --
>
> Key: ARROW-1816
> URL: https://issues.apache.org/jira/browse/ARROW-1816
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Li Jin
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Personally I think having 8 vector classes for timestamps is not great. This 
> is discussed at some point during the PR:
> https://github.com/apache/arrow/pull/1203#discussion_r145241388



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1816) [Java] Resolve new vector classes structure for timestamp, date and maybe interval

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261298#comment-16261298
 ] 

ASF GitHub Bot commented on ARROW-1816:
---

BryanCutler commented on a change in pull request #1330: wip: ARROW-1816: 
[Java] Resolve new vector classes structure for timestamp, date and maybe 
interval
URL: https://github.com/apache/arrow/pull/1330#discussion_r152363389
 
 

 ##
 File path: java/vector/src/main/codegen/templates/MapWriters.java
 ##
 @@ -242,7 +242,7 @@ public void end() {
   <#assign constructorParams = minor.arrowTypeConstructorParams />
 <#else>
   <#assign constructorParams = [] />
-  <#list minor.typeParams?reverse as typeParam>
+  <#list minor.typeParams as typeParam>
 
 Review comment:
   Why change this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Resolve new vector classes structure for timestamp, date and maybe 
> interval
> --
>
> Key: ARROW-1816
> URL: https://issues.apache.org/jira/browse/ARROW-1816
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Li Jin
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Personally I think having 8 vector classes for timestamps is not great. This 
> is discussed at some point during the PR:
> https://github.com/apache/arrow/pull/1203#discussion_r145241388



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1816) [Java] Resolve new vector classes structure for timestamp, date and maybe interval

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261300#comment-16261300
 ] 

ASF GitHub Bot commented on ARROW-1816:
---

BryanCutler commented on a change in pull request #1330: wip: ARROW-1816: 
[Java] Resolve new vector classes structure for timestamp, date and maybe 
interval
URL: https://github.com/apache/arrow/pull/1330#discussion_r152365593
 
 

 ##
 File path: java/vector/src/main/codegen/templates/UnionVector.java
 ##
 @@ -138,16 +138,20 @@ private void setReaderAndWriterIndex() {
  throw new UnsupportedOperationException("There are no inner vectors. Use 
geFieldBuffers");
   }
 
-  private String fieldName(MinorType type) {
-return type.name().toLowerCase();
+  private String fieldName(ArrowType type) {
+return Types.getMinorTypeForArrowType(type).name().toLowerCase();
   }
 
-  private FieldType fieldType(MinorType type) {
-return FieldType.nullable(type.getType());
-  }
+  // private FieldType fieldType(MinorType type) {
+  // return FieldType.nullable(type.getType());
+  // }
+
+  // private  T addOrGet(MinorType minorType, Class 
c) {
+  // return internalMap.addOrGet(fieldName(minorType), fieldType(minorType), 
c);
+  // }
 
 Review comment:
   forgot to remove?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Resolve new vector classes structure for timestamp, date and maybe 
> interval
> --
>
> Key: ARROW-1816
> URL: https://issues.apache.org/jira/browse/ARROW-1816
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Li Jin
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Personally I think having 8 vector classes for timestamps is not great. This 
> is discussed at some point during the PR:
> https://github.com/apache/arrow/pull/1203#discussion_r145241388



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1816) [Java] Resolve new vector classes structure for timestamp, date and maybe interval

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261296#comment-16261296
 ] 

ASF GitHub Bot commented on ARROW-1816:
---

BryanCutler commented on a change in pull request #1330: wip: ARROW-1816: 
[Java] Resolve new vector classes structure for timestamp, date and maybe 
interval
URL: https://github.com/apache/arrow/pull/1330#discussion_r152362306
 
 

 ##
 File path: java/vector/src/main/codegen/data/ValueVectorTypes.tdd
 ##
 @@ -73,26 +73,10 @@
 { class: "UInt8" },
 { class: "Float8",   javaType: "double", boxedType: "Double", 
fields: [{name: "value", type: "double"}] },
 { class: "DateMilli",javaType: "long",  
friendlyType: "LocalDateTime" },
-{ class: "TimeStampSec", javaType: "long",   boxedType: "Long", 
friendlyType: "LocalDateTime" },
-{ class: "TimeStampMilli",   javaType: "long",   boxedType: "Long", 
friendlyType: "LocalDateTime" },
-{ class: "TimeStampMicro",   javaType: "long",   boxedType: "Long", 
friendlyType: "LocalDateTime" },
-{ class: "TimeStampNano",javaType: "long",   boxedType: "Long", 
friendlyType: "LocalDateTime" },
-{ class: "TimeStampSecTZ", javaType: "long",   boxedType: "Long",
- typeParams: [ {name: "timezone", type: 
"String"} ],
- arrowType: 
"org.apache.arrow.vector.types.pojo.ArrowType.Timestamp",
- arrowTypeConstructorParams: 
["org.apache.arrow.vector.types.TimeUnit.SECOND", "timezone"] },
-{ class: "TimeStampMilliTZ", javaType: "long",   boxedType: "Long",
- typeParams: [ {name: "timezone", type: 
"String"} ],
- arrowType: 
"org.apache.arrow.vector.types.pojo.ArrowType.Timestamp",
- arrowTypeConstructorParams: 
["org.apache.arrow.vector.types.TimeUnit.MILLISECOND", "timezone"] },
-{ class: "TimeStampMicroTZ", javaType: "long",   boxedType: "Long",
- typeParams: [ {name: "timezone", type: 
"String"} ],
- arrowType: 
"org.apache.arrow.vector.types.pojo.ArrowType.Timestamp",
- arrowTypeConstructorParams: 
["org.apache.arrow.vector.types.TimeUnit.MICROSECOND", "timezone"] },
-{ class: "TimeStampNanoTZ", javaType: "long",   boxedType: "Long",
- typeParams: [ {name: "timezone", type: 
"String"} ],
- arrowType: 
"org.apache.arrow.vector.types.pojo.ArrowType.Timestamp",
- arrowTypeConstructorParams: 
["org.apache.arrow.vector.types.TimeUnit.NANOSECOND", "timezone"] },
+{ class: "Timestamp",javaType: "long",   boxedType: "Long", 
friendlyType: "LocalDateTime"
 
 Review comment:
   Is the `friendlyType` param still used any more?  If so then will 
`LocalDateTime` work with a timezone set?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Resolve new vector classes structure for timestamp, date and maybe 
> interval
> --
>
> Key: ARROW-1816
> URL: https://issues.apache.org/jira/browse/ARROW-1816
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Li Jin
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Personally I think having 8 vector classes for timestamps is not great. This 
> is discussed at some point during the PR:
> https://github.com/apache/arrow/pull/1203#discussion_r145241388



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1816) [Java] Resolve new vector classes structure for timestamp, date and maybe interval

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261295#comment-16261295
 ] 

ASF GitHub Bot commented on ARROW-1816:
---

BryanCutler commented on a change in pull request #1330: wip: ARROW-1816: 
[Java] Resolve new vector classes structure for timestamp, date and maybe 
interval
URL: https://github.com/apache/arrow/pull/1330#discussion_r152361482
 
 

 ##
 File path: java/vector/src/main/codegen/data/ValueVectorTypes.tdd
 ##
 @@ -116,7 +100,7 @@
 {
   class: "Decimal",
   maxPrecisionDigits: 38, nDecimalDigits: 4, friendlyType: 
"BigDecimal",
-  typeParams: [ {name: "scale", type: "int"}, { name: "precision", 
type: "int"}],
+  typeParams: [ { name: "precision", type: "int"}, {name: "scale", 
type: "int"} ],
 
 Review comment:
   These were previously flipped in ARROW-1091 and then flipped again in 
ARROW-1092.  I thought they were right already, are they not?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Resolve new vector classes structure for timestamp, date and maybe 
> interval
> --
>
> Key: ARROW-1816
> URL: https://issues.apache.org/jira/browse/ARROW-1816
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Li Jin
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Personally I think having 8 vector classes for timestamps is not great. This 
> is discussed at some point during the PR:
> https://github.com/apache/arrow/pull/1203#discussion_r145241388



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1816) [Java] Resolve new vector classes structure for timestamp, date and maybe interval

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261302#comment-16261302
 ] 

ASF GitHub Bot commented on ARROW-1816:
---

BryanCutler commented on a change in pull request #1330: wip: ARROW-1816: 
[Java] Resolve new vector classes structure for timestamp, date and maybe 
interval
URL: https://github.com/apache/arrow/pull/1330#discussion_r152362623
 
 

 ##
 File path: java/vector/pom.xml
 ##
 @@ -135,6 +135,13 @@
 org.apache.drill.tools
 drill-fmpp-maven-plugin
 1.5.0
+
+  
+org.freemarker
+freemarker
+2.3.23
+  
+
 
 Review comment:
   How come this needs to be added now?  If it does can it be scoped to compile 
phase?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Resolve new vector classes structure for timestamp, date and maybe 
> interval
> --
>
> Key: ARROW-1816
> URL: https://issues.apache.org/jira/browse/ARROW-1816
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Li Jin
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Personally I think having 8 vector classes for timestamps is not great. This 
> is discussed at some point during the PR:
> https://github.com/apache/arrow/pull/1203#discussion_r145241388



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1816) [Java] Resolve new vector classes structure for timestamp, date and maybe interval

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261299#comment-16261299
 ] 

ASF GitHub Bot commented on ARROW-1816:
---

BryanCutler commented on a change in pull request #1330: wip: ARROW-1816: 
[Java] Resolve new vector classes structure for timestamp, date and maybe 
interval
URL: https://github.com/apache/arrow/pull/1330#discussion_r152370888
 
 

 ##
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/types/TimeUnit.java
 ##
 @@ -19,10 +19,10 @@
 package org.apache.arrow.vector.types;
 
 public enum TimeUnit {
-  SECOND(org.apache.arrow.flatbuf.TimeUnit.SECOND),
-  MILLISECOND(org.apache.arrow.flatbuf.TimeUnit.MILLISECOND),
-  MICROSECOND(org.apache.arrow.flatbuf.TimeUnit.MICROSECOND),
-  NANOSECOND(org.apache.arrow.flatbuf.TimeUnit.NANOSECOND);
+  SECOND(org.apache.arrow.flatbuf.TimeUnit.SECOND, 
java.util.concurrent.TimeUnit.SECONDS),
+  MILLISECOND(org.apache.arrow.flatbuf.TimeUnit.MILLISECOND, 
java.util.concurrent.TimeUnit.MILLISECONDS),
+  MICROSECOND(org.apache.arrow.flatbuf.TimeUnit.MICROSECOND, 
java.util.concurrent.TimeUnit.MICROSECONDS),
+  NANOSECOND(org.apache.arrow.flatbuf.TimeUnit.NANOSECOND, 
java.util.concurrent.TimeUnit.NANOSECONDS);
 
 Review comment:
   Would it be better to leave this as it was and just put a switch statement 
in `getTimeUnit` to return the corresponding java TimeUnit?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Resolve new vector classes structure for timestamp, date and maybe 
> interval
> --
>
> Key: ARROW-1816
> URL: https://issues.apache.org/jira/browse/ARROW-1816
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Li Jin
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Personally I think having 8 vector classes for timestamps is not great. This 
> is discussed at some point during the PR:
> https://github.com/apache/arrow/pull/1203#discussion_r145241388



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1816) [Java] Resolve new vector classes structure for timestamp, date and maybe interval

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261301#comment-16261301
 ] 

ASF GitHub Bot commented on ARROW-1816:
---

BryanCutler commented on a change in pull request #1330: wip: ARROW-1816: 
[Java] Resolve new vector classes structure for timestamp, date and maybe 
interval
URL: https://github.com/apache/arrow/pull/1330#discussion_r152369556
 
 

 ##
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/NullableTimestampVector.java
 ##
 @@ -124,6 +177,32 @@ public void setSafe(int index, long value) {
 set(index, value);
   }
 
+  public void setSafe(int index, NullableTimestampHolder holder) {
 
 Review comment:
   Do these set methods need to check that the type params are compatible?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Resolve new vector classes structure for timestamp, date and maybe 
> interval
> --
>
> Key: ARROW-1816
> URL: https://issues.apache.org/jira/browse/ARROW-1816
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Li Jin
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Personally I think having 8 vector classes for timestamps is not great. This 
> is discussed at some point during the PR:
> https://github.com/apache/arrow/pull/1203#discussion_r145241388



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1832) [JS] Implement JSON reader for integration tests

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261280#comment-16261280
 ] 

ASF GitHub Bot commented on ARROW-1832:
---

trxcllnt commented on issue #1343: [WIP] ARROW-1832: [JS] Implement JSON reader 
for integration tests
URL: https://github.com/apache/arrow/pull/1343#issuecomment-346124815
 
 
   @theneuralbit awesome! Will look closer when I’m on my laptop. To fix the 
closure build issue, just update [this 
line](https://github.com/apache/arrow/blob/master/js/gulp/typescript-task.js#L58)
 to the new generated JS files glob path.
   
   It’s hacky, but cc with advanced opts can’t track the property names through 
the nested function declaration IEFE’s that TS compiles namespaces to while 
mangling, so we use the flatc generated JS files here instead.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Implement JSON reader for integration tests
> 
>
> Key: ARROW-1832
> URL: https://issues.apache.org/jira/browse/ARROW-1832
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
>
> Implementing a JSON reader will allow us to write a "validate" script for the 
> consumer half of the integration tests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261273#comment-16261273
 ] 

ASF GitHub Bot commented on ARROW-1710:
---

icexelloss commented on a change in pull request #1341: [WIP] ARROW-1710: 
[Java] Remove Non-Nullable Vectors
URL: https://github.com/apache/arrow/pull/1341#discussion_r152368299
 
 

 ##
 File path: java/vector/src/test/java/org/apache/arrow/vector/TestBitVector.java
 ##
 @@ -426,81 +405,81 @@ public void testReallocAfterVectorTransfer2() {
   public void testBitVector() {
 // Create a new value vector for 1024 integers
 try (final BitVector vector = new BitVector(EMPTY_SCHEMA_PATH, allocator)) 
{
-  final BitVector.Mutator m = vector.getMutator();
   vector.allocateNew(1024);
-  m.setValueCount(1024);
+  vector.setValueCount(1024);
 
   // Put and set a few values
-  m.set(0, 1);
-  m.set(1, 0);
-  m.set(100, 0);
-  m.set(1022, 1);
+  vector.set(0, 1);
+  vector.set(1, 0);
+  vector.set(100, 0);
+  vector.set(1022, 1);
 
-  m.setValueCount(1024);
+  vector.setValueCount(1024);
 
-  final BitVector.Accessor accessor = vector.getAccessor();
-  assertEquals(1, accessor.get(0));
-  assertEquals(0, accessor.get(1));
-  assertEquals(0, accessor.get(100));
-  assertEquals(1, accessor.get(1022));
+  assertEquals(1, vector.get(0));
+  assertEquals(0, vector.get(1));
+  assertEquals(0, vector.get(100));
+  assertEquals(1, vector.get(1022));
 
-  assertEquals(1022, accessor.getNullCount());
+  assertEquals(1020, vector.getNullCount());
 
 Review comment:
   I agree this is an misnomer. I think this is fine as long as it doesn't 
break Dremio. cc @siddharthteotia.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Decide what to do with non-nullable vectors in new vector class 
> hierarchy 
> -
>
> Key: ARROW-1710
> URL: https://issues.apache.org/jira/browse/ARROW-1710
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java - Vectors
>Reporter: Li Jin
>Assignee: Bryan Cutler
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> So far the consensus seems to be remove all non-nullable vectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261250#comment-16261250
 ] 

ASF GitHub Bot commented on ARROW-1710:
---

BryanCutler commented on a change in pull request #1341: [WIP] ARROW-1710: 
[Java] Remove Non-Nullable Vectors
URL: https://github.com/apache/arrow/pull/1341#discussion_r152364883
 
 

 ##
 File path: java/vector/src/test/java/org/apache/arrow/vector/TestBitVector.java
 ##
 @@ -426,81 +405,81 @@ public void testReallocAfterVectorTransfer2() {
   public void testBitVector() {
 // Create a new value vector for 1024 integers
 try (final BitVector vector = new BitVector(EMPTY_SCHEMA_PATH, allocator)) 
{
-  final BitVector.Mutator m = vector.getMutator();
   vector.allocateNew(1024);
-  m.setValueCount(1024);
+  vector.setValueCount(1024);
 
   // Put and set a few values
-  m.set(0, 1);
-  m.set(1, 0);
-  m.set(100, 0);
-  m.set(1022, 1);
+  vector.set(0, 1);
+  vector.set(1, 0);
+  vector.set(100, 0);
+  vector.set(1022, 1);
 
-  m.setValueCount(1024);
+  vector.setValueCount(1024);
 
-  final BitVector.Accessor accessor = vector.getAccessor();
-  assertEquals(1, accessor.get(0));
-  assertEquals(0, accessor.get(1));
-  assertEquals(0, accessor.get(100));
-  assertEquals(1, accessor.get(1022));
+  assertEquals(1, vector.get(0));
+  assertEquals(0, vector.get(1));
+  assertEquals(0, vector.get(100));
+  assertEquals(1, vector.get(1022));
 
-  assertEquals(1022, accessor.getNullCount());
+  assertEquals(1020, vector.getNullCount());
 
 Review comment:
   Yeah that would make sense, but sort of a misnomer imo.  Should we include 
this change in the release notes somewhere?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Decide what to do with non-nullable vectors in new vector class 
> hierarchy 
> -
>
> Key: ARROW-1710
> URL: https://issues.apache.org/jira/browse/ARROW-1710
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java - Vectors
>Reporter: Li Jin
>Assignee: Bryan Cutler
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> So far the consensus seems to be remove all non-nullable vectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1832) [JS] Implement JSON reader for integration tests

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261205#comment-16261205
 ] 

ASF GitHub Bot commented on ARROW-1832:
---

TheNeuralBit commented on issue #1343: [WIP] ARROW-1832: [JS] Implement JSON 
reader for integration tests
URL: https://github.com/apache/arrow/pull/1343#issuecomment-346114553
 
 
   @trxcllnt I seem to have broken the closure compiler build by adding 
`src/format/arrow.ts` - any idea what's wrong?
   
   Also, I'm interested in your feedback on some of the re-structuring I did. I 
made a [`FieldBuilder` and 
`FieldNodeBuilder`](https://github.com/apache/arrow/pull/1343/commits/4e9767aca59fe471e62c3b770f3946d51eab2012#diff-b4be157e615dd41b8b50ad1c6021e506R26)
 which replicate the flatbuffers `Field` and `FieldNode` interfaces, and then I 
made the `fieldMixin` constructor accept [either 
type](https://github.com/apache/arrow/pull/1343/commits/4e9767aca59fe471e62c3b770f3946d51eab2012#diff-7ae84ba988036361542a685bcd1d672dR22).
 That way I can create "field" vectors from the JSON reader without building 
flatbuffers objects.
   
   I thought something like this may be useful for a writer in the future, 
which is why I went ahead and stubbed out some [write() 
methods](https://github.com/apache/arrow/pull/1343/commits/4e9767aca59fe471e62c3b770f3946d51eab2012#diff-b4be157e615dd41b8b50ad1c6021e506R43).
 What do you think?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Implement JSON reader for integration tests
> 
>
> Key: ARROW-1832
> URL: https://issues.apache.org/jira/browse/ARROW-1832
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
>
> Implementing a JSON reader will allow us to write a "validate" script for the 
> consumer half of the integration tests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261192#comment-16261192
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

BryanCutler commented on issue #1259: ARROW-1047: [Java] Add Generic Reader 
Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#issuecomment-346112886
 
 
   @siddharthteotia what ever is easier for this, but I would like to hear that 
I didn't break anything on your side :)  It's pretty easy to rebase this, so no 
need to rush


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1832) [JS] Implement JSON reader for integration tests

2017-11-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-1832:
--
Labels: pull-request-available  (was: )

> [JS] Implement JSON reader for integration tests
> 
>
> Key: ARROW-1832
> URL: https://issues.apache.org/jira/browse/ARROW-1832
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
>
> Implementing a JSON reader will allow us to write a "validate" script for the 
> consumer half of the integration tests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1832) [JS] Implement JSON reader for integration tests

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261183#comment-16261183
 ] 

ASF GitHub Bot commented on ARROW-1832:
---

TheNeuralBit opened a new pull request #1343: [WIP] ARROW-1832: [JS] Implement 
JSON reader for integration tests
URL: https://github.com/apache/arrow/pull/1343
 
 
   Add JSON reader, as well as `js/bin/integration.js` script for running 
integration test validation


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Implement JSON reader for integration tests
> 
>
> Key: ARROW-1832
> URL: https://issues.apache.org/jira/browse/ARROW-1832
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
>
> Implementing a JSON reader will allow us to write a "validate" script for the 
> consumer half of the integration tests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1703) [C++] Vendor exact version of jemalloc we depend on

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261028#comment-16261028
 ] 

ASF GitHub Bot commented on ARROW-1703:
---

xhochy commented on issue #1334: ARROW-1703: [C++] Vendor exact version of 
jemalloc we depend on
URL: https://github.com/apache/arrow/pull/1334#issuecomment-346088573
 
 
   I'm not sure if we have a free matrix entry anymore where we would be able 
to activate it again. I would like to (gradually) reintroduce `jemalloc` as the 
default again but we also should keep jemalloc-free builds in our test matrix.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Vendor exact version of jemalloc we depend on
> ---
>
> Key: ARROW-1703
> URL: https://issues.apache.org/jira/browse/ARROW-1703
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Uwe L. Korn
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Since we are likely going to be using a patched jemalloc, we probably should 
> not support using jemalloc with any other version, or relying on system 
> packages. jemalloc would therefore always be built together with Arrow if 
> {{ARROW_JEMALLOC}} is on
> For this reason I believe we should vendor the code at the pinned commit as 
> with Redis and other projects: 
> https://github.com/antirez/redis/tree/unstable/deps



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1842) ParquetDataset.read(): selectively reading array column

2017-11-21 Thread Young-Jun Ko (JIRA)
Young-Jun Ko created ARROW-1842:
---

 Summary: ParquetDataset.read(): selectively reading array column
 Key: ARROW-1842
 URL: https://issues.apache.org/jira/browse/ARROW-1842
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.7.1
Reporter: Young-Jun Ko


Scenario:
- created a dataframe in spark and saved it as parquet
- columns include simple types, e.g. String, but also an array of doubles

Issue:
I can read the whole data using ParquetDataset in pyarrow.
I tried reading selectively a simple type => works
I tried reading selectively the array column => key error in the following 
place:

KeyError: 'c'

/home/hadoop/Python/lib/python2.7/site-packages/pyarrow/_parquet.pyx in 
pyarrow._parquet.ParquetReader.column_name_idx 
(/arrow/python/build/temp.linux-x86_64-2.7/_parquet.cxx:9777)()
513 self.column_idx_map[col_bytes] = i
514 
--> 515 return self.column_idx_map[tobytes(column_name)]

When I just read the whole dataset, I get the correct metadata


pyarrow.Table
a: string
b: string
c: list
  child 0, element: double
d: int64
metadata

{'org.apache.spark.sql.parquet.row.metadata': 
'{"type":"struct","fields":[{"name":"a","type":"string","nullable":true,"metadata":{}},{"name":"b","type":"string","nullable":true,"metadata":{}},{"name":"c","type":{"type":"array","elementType":"double","containsNull":false},"nullable":true,"metadata":{}},{"name":"d","type":"long","nullable":false,"metadata":{}}]}'}


I might just be missing the correct naming convention of the array column.
But then this name should be reflected in the metadata.

Thanks!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1841) [JS] Update text-encoding-utf-8 and tslib for node ESModules support

2017-11-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-1841:
--
Labels: pull-request-available  (was: )

> [JS] Update text-encoding-utf-8 and tslib for node ESModules support
> 
>
> Key: ARROW-1841
> URL: https://issues.apache.org/jira/browse/ARROW-1841
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Wes McKinney
>Assignee: Paul Taylor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1841) [JS] Update text-encoding-utf-8 and tslib for node ESModules support

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260854#comment-16260854
 ] 

ASF GitHub Bot commented on ARROW-1841:
---

wesm closed pull request #1338: ARROW-1841: [JS] Update text-encoding-utf-8 and 
tslib for node ESModules support
URL: https://github.com/apache/arrow/pull/1338
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/js/gulp/package-task.js b/js/gulp/package-task.js
index 7b4b15a33..ad56d172e 100644
--- a/js/gulp/package-task.js
+++ b/js/gulp/package-task.js
@@ -49,6 +49,12 @@ const createMainPackageJson = (target, format) => (orig) => 
({
 browser: `${mainExport}.es5.min.js`,
 [`browser:es2015`]: `${mainExport}.es2015.min.js`,
 [`@std/esm`]: { esm: `mjs` },
+// Temporary workaround until https://github.com/Microsoft/tslib/pull/44 
is merged
+scripts: {
+postinstall: `npm i shx && npm run tslib_mjs && npm run tslib_pkg && 
npm r shx`,
+tslib_mjs: `shx cp $(node -e 
\"console.log(require.resolve('tslib/tslib.es6.js'))\") $(node -e \"var 
r=require,p=r('path');console.log(p.join(p.dirname(r.resolve('tslib')),'tslib.mjs'))\")`,
+tslib_pkg: `node -e \"var 
r=require,p=r('path'),f=r('fs'),k=p.join(p.dirname(r.resolve('tslib')),'package.json'),x=JSON.parse(f.readFileSync(k));x.main='tslib';f.writeFileSync(k,JSON.stringify(x))\"`
+}
 });
   
 const createTypeScriptPackageJson = (target, format) => (orig) => ({
diff --git a/js/package.json b/js/package.json
index 24bc27f5b..1a110b2d5 100644
--- a/js/package.json
+++ b/js/package.json
@@ -25,7 +25,8 @@
 "lint": "npm-run-all -p lint:*",
 "lint:src": "tslint --fix --project -p tsconfig.json -c tslint.json 
\"src/**/*.ts\"",
 "lint:test": "tslint --fix --project -p test/tsconfig.json -c tslint.json 
\"test/**/*.ts\"",
-"prepublishOnly": "echo \"Error: do 'npm run release' instead of 'npm 
publish'\" && exit 1"
+"prepublishOnly": "echo \"Error: do 'npm run release' instead of 'npm 
publish'\" && exit 1",
+"postinstall": "shx cp node_modules/tslib/tslib.es6.js 
node_modules/tslib/tslib.mjs"
   },
   "repository": {
 "type": "git",
@@ -54,7 +55,8 @@
   },
   "dependencies": {
 "flatbuffers": "trxcllnt/flatbuffers-esm",
-"text-encoding": "0.6.4"
+"text-encoding-utf-8": "^1.0.2",
+"tslib": "^1.8.0"
   },
   "devDependencies": {
 "@std/esm": "0.13.0",
@@ -90,10 +92,8 @@
 "rxjs": "5.5.2",
 "shx": "0.2.2",
 "source-map-loader": "0.2.3",
-"text-encoding-utf-8": "1.0.1",
 "trash": "4.1.0",
 "ts-jest": "21.2.1",
-"tslib": "1.8.0",
 "tslint": "5.8.0",
 "typescript": "2.6.1",
 "uglifyjs-webpack-plugin": "1.0.1",


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Update text-encoding-utf-8 and tslib for node ESModules support
> 
>
> Key: ARROW-1841
> URL: https://issues.apache.org/jira/browse/ARROW-1841
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Wes McKinney
>Assignee: Paul Taylor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (ARROW-1841) [JS] Update text-encoding-utf-8 and tslib for node ESModules support

2017-11-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-1841.
-
Resolution: Fixed

Issue resolved by pull request 1338
[https://github.com/apache/arrow/pull/1338]

> [JS] Update text-encoding-utf-8 and tslib for node ESModules support
> 
>
> Key: ARROW-1841
> URL: https://issues.apache.org/jira/browse/ARROW-1841
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Wes McKinney
>Assignee: Paul Taylor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1841) [JS] Update text-encoding-utf-8 and tslib for node ESModules support

2017-11-21 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1841:
---

 Summary: [JS] Update text-encoding-utf-8 and tslib for node 
ESModules support
 Key: ARROW-1841
 URL: https://issues.apache.org/jira/browse/ARROW-1841
 Project: Apache Arrow
  Issue Type: Improvement
  Components: JavaScript
Reporter: Wes McKinney
Assignee: Paul Taylor
 Fix For: 0.8.0






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1838) [C++] Use compute::Datum uniformly for input argument to kernels

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260851#comment-16260851
 ] 

ASF GitHub Bot commented on ARROW-1838:
---

wesm closed pull request #1339: ARROW-1838: [C++] Conform kernel API to use 
Datum for input and output
URL: https://github.com/apache/arrow/pull/1339
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/cpp/src/arrow/compute/compute-test.cc 
b/cpp/src/arrow/compute/compute-test.cc
index 58a991c60..fa408ae40 100644
--- a/cpp/src/arrow/compute/compute-test.cc
+++ b/cpp/src/arrow/compute/compute-test.cc
@@ -697,7 +697,7 @@ TEST_F(TestCast, PreallocatedMemory) {
   out_data->buffers.push_back(out_values);
 
   Datum out(out_data);
-  ASSERT_OK(kernel->Call(>ctx_, *arr->data(), ));
+  ASSERT_OK(kernel->Call(>ctx_, Datum(arr), ));
 
   // Buffer address unchanged
   ASSERT_EQ(out_values.get(), out_data->buffers[1].get());
diff --git a/cpp/src/arrow/compute/kernel.h b/cpp/src/arrow/compute/kernel.h
index 0037245d6..7ff506ca0 100644
--- a/cpp/src/arrow/compute/kernel.h
+++ b/cpp/src/arrow/compute/kernel.h
@@ -131,7 +131,7 @@ struct ARROW_EXPORT Datum {
 /// \brief An array-valued function of a single input argument
 class ARROW_EXPORT UnaryKernel : public OpKernel {
  public:
-  virtual Status Call(FunctionContext* ctx, const ArrayData& input, Datum* 
out) = 0;
+  virtual Status Call(FunctionContext* ctx, const Datum& input, Datum* out) = 
0;
 };
 
 }  // namespace compute
diff --git a/cpp/src/arrow/compute/kernels/cast.cc 
b/cpp/src/arrow/compute/kernels/cast.cc
index c866054ea..d595d2ea5 100644
--- a/cpp/src/arrow/compute/kernels/cast.cc
+++ b/cpp/src/arrow/compute/kernels/cast.cc
@@ -740,20 +740,23 @@ class CastKernel : public UnaryKernel {
 can_pre_allocate_values_(can_pre_allocate_values),
 out_type_(out_type) {}
 
-  Status Call(FunctionContext* ctx, const ArrayData& input, Datum* out) 
override {
+  Status Call(FunctionContext* ctx, const Datum& input, Datum* out) override {
+DCHECK_EQ(Datum::ARRAY, input.kind());
+
+const ArrayData& in_data = *input.array();
 ArrayData* result;
 
 if (out->kind() == Datum::NONE) {
-  out->value = std::make_shared(out_type_, input.length);
+  out->value = std::make_shared(out_type_, in_data.length);
 }
 
 result = out->array().get();
 
 if (!is_zero_copy_) {
   RETURN_NOT_OK(
-  AllocateIfNotPreallocated(ctx, input, can_pre_allocate_values_, 
result));
+  AllocateIfNotPreallocated(ctx, in_data, can_pre_allocate_values_, 
result));
 }
-func_(ctx, options_, input, result);
+func_(ctx, options_, in_data, result);
 
 RETURN_IF_ERROR(ctx);
 return Status::OK();
diff --git a/cpp/src/arrow/compute/kernels/hash.cc 
b/cpp/src/arrow/compute/kernels/hash.cc
index 3af41609f..95f039932 100644
--- a/cpp/src/arrow/compute/kernels/hash.cc
+++ b/cpp/src/arrow/compute/kernels/hash.cc
@@ -658,8 +658,9 @@ class HashKernelImpl : public HashKernel {
   explicit HashKernelImpl(std::unique_ptr hasher)
   : hasher_(std::move(hasher)) {}
 
-  Status Call(FunctionContext* ctx, const ArrayData& input, Datum* out) 
override {
-RETURN_NOT_OK(Append(ctx, input));
+  Status Call(FunctionContext* ctx, const Datum& input, Datum* out) override {
+DCHECK_EQ(Datum::ARRAY, input.kind());
+RETURN_NOT_OK(Append(ctx, *input.array()));
 return Flush(out);
   }
 
diff --git a/cpp/src/arrow/compute/kernels/util-internal.cc 
b/cpp/src/arrow/compute/kernels/util-internal.cc
index df68637e0..28428bfcb 100644
--- a/cpp/src/arrow/compute/kernels/util-internal.cc
+++ b/cpp/src/arrow/compute/kernels/util-internal.cc
@@ -34,13 +34,13 @@ Status InvokeUnaryArrayKernel(FunctionContext* ctx, 
UnaryKernel* kernel,
   const Datum& value, std::vector* outputs) 
{
   if (value.kind() == Datum::ARRAY) {
 Datum output;
-RETURN_NOT_OK(kernel->Call(ctx, *value.array(), ));
+RETURN_NOT_OK(kernel->Call(ctx, value, ));
 outputs->push_back(output);
   } else if (value.kind() == Datum::CHUNKED_ARRAY) {
 const ChunkedArray& array = *value.chunked_array();
 for (int i = 0; i < array.num_chunks(); i++) {
   Datum output;
-  RETURN_NOT_OK(kernel->Call(ctx, *(array.chunk(i)->data()), ));
+  RETURN_NOT_OK(kernel->Call(ctx, Datum(array.chunk(i)), ));
   outputs->push_back(output);
 }
   } else {


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:

[jira] [Resolved] (ARROW-1838) [C++] Use compute::Datum uniformly for input argument to kernels

2017-11-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-1838.
-
Resolution: Fixed

Issue resolved by pull request 1339
[https://github.com/apache/arrow/pull/1339]

> [C++] Use compute::Datum uniformly for input argument to kernels
> 
>
> Key: ARROW-1838
> URL: https://issues.apache.org/jira/browse/ARROW-1838
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> This is some API tidying after ARROW-1559. Some kernel APIs are still using 
> {{ArrayData}} for the input argument



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1838) [C++] Use compute::Datum uniformly for input argument to kernels

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260849#comment-16260849
 ] 

ASF GitHub Bot commented on ARROW-1838:
---

wesm commented on issue #1339: ARROW-1838: [C++] Conform kernel API to use 
Datum for input and output
URL: https://github.com/apache/arrow/pull/1339#issuecomment-346051820
 
 
   +1


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Use compute::Datum uniformly for input argument to kernels
> 
>
> Key: ARROW-1838
> URL: https://issues.apache.org/jira/browse/ARROW-1838
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> This is some API tidying after ARROW-1559. Some kernel APIs are still using 
> {{ArrayData}} for the input argument



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1840) [Website] The installation command failed on Windows10 anaconda environment.

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260846#comment-16260846
 ] 

ASF GitHub Bot commented on ARROW-1840:
---

wesm closed pull request #1342: ARROW-1840: [Website] The installation command 
failed on Windows10 anaconda envir…
URL: https://github.com/apache/arrow/pull/1342
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/site/install.md b/site/install.md
index 0ef2008db..67b26983b 100644
--- a/site/install.md
+++ b/site/install.md
@@ -53,7 +53,7 @@ Install them with:
 
 ```shell
 conda install arrow-cpp=0.7.* -c conda-forge
-conda install pyarrow==0.7.* -c conda-forge
+conda install pyarrow=0.7.* -c conda-forge
 ```
 
 ### Python Wheels on PyPI (Unofficial)


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Website] The installation command failed on Windows10 anaconda environment.
> 
>
> Key: ARROW-1840
> URL: https://issues.apache.org/jira/browse/ARROW-1840
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Website
>Affects Versions: 0.7.1
> Environment: Windows10 and Anaconda
>Reporter: Kazuhiro Sodebayashi
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> In https://arrow.apache.org/install/ ,  this command is failed my environment.
> {{conda install pyarrow==0.7.* -c conda-forge}}
> Output:
> {noformat}
> PackageNotFoundError: Packages missing in current channels:
>   - pyarrow ==0.7.*
> We have searched for the packages in the following channels:
>   - https://conda.anaconda.org/conda-forge/win-64
>   - https://conda.anaconda.org/conda-forge/noarch
>   - https://repo.continuum.io/pkgs/main/win-64
>   - https://repo.continuum.io/pkgs/main/noarch
>   - https://repo.continuum.io/pkgs/free/win-64
>   - https://repo.continuum.io/pkgs/free/noarch
>   - https://repo.continuum.io/pkgs/r/win-64
>   - https://repo.continuum.io/pkgs/r/noarch
>   - https://repo.continuum.io/pkgs/pro/win-64
>   - https://repo.continuum.io/pkgs/pro/noarch
>   - https://repo.continuum.io/pkgs/msys2/win-64
>   - https://repo.continuum.io/pkgs/msys2/noarch
> {noformat}
> If I input the follwing command, the installation is succeded.
> {{conda install pyarrow=0.7.* -c conda-forge}}
> The difference between them is "==" and "=".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (ARROW-1840) [Website] The installation command failed on Windows10 anaconda environment.

2017-11-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-1840.
-
Resolution: Fixed

Issue resolved by pull request 1342
[https://github.com/apache/arrow/pull/1342]

> [Website] The installation command failed on Windows10 anaconda environment.
> 
>
> Key: ARROW-1840
> URL: https://issues.apache.org/jira/browse/ARROW-1840
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Website
>Affects Versions: 0.7.1
> Environment: Windows10 and Anaconda
>Reporter: Kazuhiro Sodebayashi
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> In https://arrow.apache.org/install/ ,  this command is failed my environment.
> {{conda install pyarrow==0.7.* -c conda-forge}}
> Output:
> {noformat}
> PackageNotFoundError: Packages missing in current channels:
>   - pyarrow ==0.7.*
> We have searched for the packages in the following channels:
>   - https://conda.anaconda.org/conda-forge/win-64
>   - https://conda.anaconda.org/conda-forge/noarch
>   - https://repo.continuum.io/pkgs/main/win-64
>   - https://repo.continuum.io/pkgs/main/noarch
>   - https://repo.continuum.io/pkgs/free/win-64
>   - https://repo.continuum.io/pkgs/free/noarch
>   - https://repo.continuum.io/pkgs/r/win-64
>   - https://repo.continuum.io/pkgs/r/noarch
>   - https://repo.continuum.io/pkgs/pro/win-64
>   - https://repo.continuum.io/pkgs/pro/noarch
>   - https://repo.continuum.io/pkgs/msys2/win-64
>   - https://repo.continuum.io/pkgs/msys2/noarch
> {noformat}
> If I input the follwing command, the installation is succeded.
> {{conda install pyarrow=0.7.* -c conda-forge}}
> The difference between them is "==" and "=".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1840) [Website] The installation command failed on Windows10 anaconda environment.

2017-11-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1840:

Fix Version/s: 0.8.0

> [Website] The installation command failed on Windows10 anaconda environment.
> 
>
> Key: ARROW-1840
> URL: https://issues.apache.org/jira/browse/ARROW-1840
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Website
>Affects Versions: 0.7.1
> Environment: Windows10 and Anaconda
>Reporter: Kazuhiro Sodebayashi
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> In https://arrow.apache.org/install/ ,  this command is failed my environment.
> {{conda install pyarrow==0.7.* -c conda-forge}}
> Output:
> {noformat}
> PackageNotFoundError: Packages missing in current channels:
>   - pyarrow ==0.7.*
> We have searched for the packages in the following channels:
>   - https://conda.anaconda.org/conda-forge/win-64
>   - https://conda.anaconda.org/conda-forge/noarch
>   - https://repo.continuum.io/pkgs/main/win-64
>   - https://repo.continuum.io/pkgs/main/noarch
>   - https://repo.continuum.io/pkgs/free/win-64
>   - https://repo.continuum.io/pkgs/free/noarch
>   - https://repo.continuum.io/pkgs/r/win-64
>   - https://repo.continuum.io/pkgs/r/noarch
>   - https://repo.continuum.io/pkgs/pro/win-64
>   - https://repo.continuum.io/pkgs/pro/noarch
>   - https://repo.continuum.io/pkgs/msys2/win-64
>   - https://repo.continuum.io/pkgs/msys2/noarch
> {noformat}
> If I input the follwing command, the installation is succeded.
> {{conda install pyarrow=0.7.* -c conda-forge}}
> The difference between them is "==" and "=".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1808) [C++] Make RecordBatch interface virtual to permit record batches that lazy-materialize columns

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260841#comment-16260841
 ] 

ASF GitHub Bot commented on ARROW-1808:
---

wesm commented on issue #1337: ARROW-1808: [C++] Make RecordBatch, Table 
virtual interfaces for column access
URL: https://github.com/apache/arrow/pull/1337#issuecomment-346050608
 
 
   @kou it seems like there are two different issues here. 
   
   Here, a schema with 1 field was passed along with a list of 0 columns:
   
   ```diff
   -record_batch = Arrow::RecordBatch.new(schema, 0, [])
   +record_batch = Arrow::RecordBatch.new(schema,
   +  data.size,
   +  [build_boolean_array(data)])
   ```
   
   I believe this would result in segfaults even if the number of rows is 
non-zero. So having empty / length-0 record batches in the IPC writer code path 
is fine so long as the columns matches the schema. 
   
   The reason this bug was not caught before was that the 
`RecordBatch::columns_` member was being used to determine 
`RecordBatch::num_columns()`, whereas now we are using the schema. It seems 
like respecting the schema is the right approach. I could add boundschecking to 
`SimpleRecordBatch::column(i)` and return null if the index is out of bounds, 
would that help at all?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Make RecordBatch interface virtual to permit record batches that 
> lazy-materialize columns
> ---
>
> Key: ARROW-1808
> URL: https://issues.apache.org/jira/browse/ARROW-1808
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> This should be looked at soon to prevent having to define a different virtual 
> interface for record batches. There are places where we are using the record 
> batch constructor directly, and in some third party code (like MapD), so this 
> might be good to get done for 0.8.0



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1840) [Website] The installation command failed on Windows10 anaconda environment.

2017-11-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-1840:
--
Labels: pull-request-available  (was: )

> [Website] The installation command failed on Windows10 anaconda environment.
> 
>
> Key: ARROW-1840
> URL: https://issues.apache.org/jira/browse/ARROW-1840
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Website
>Affects Versions: 0.7.1
> Environment: Windows10 and Anaconda
>Reporter: Kazuhiro Sodebayashi
>Priority: Minor
>  Labels: pull-request-available
>
> In https://arrow.apache.org/install/ ,  this command is failed my environment.
> {{conda install pyarrow==0.7.* -c conda-forge}}
> Output:
> {noformat}
> PackageNotFoundError: Packages missing in current channels:
>   - pyarrow ==0.7.*
> We have searched for the packages in the following channels:
>   - https://conda.anaconda.org/conda-forge/win-64
>   - https://conda.anaconda.org/conda-forge/noarch
>   - https://repo.continuum.io/pkgs/main/win-64
>   - https://repo.continuum.io/pkgs/main/noarch
>   - https://repo.continuum.io/pkgs/free/win-64
>   - https://repo.continuum.io/pkgs/free/noarch
>   - https://repo.continuum.io/pkgs/r/win-64
>   - https://repo.continuum.io/pkgs/r/noarch
>   - https://repo.continuum.io/pkgs/pro/win-64
>   - https://repo.continuum.io/pkgs/pro/noarch
>   - https://repo.continuum.io/pkgs/msys2/win-64
>   - https://repo.continuum.io/pkgs/msys2/noarch
> {noformat}
> If I input the follwing command, the installation is succeded.
> {{conda install pyarrow=0.7.* -c conda-forge}}
> The difference between them is "==" and "=".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1840) [Website] The installation command failed on Windows10 anaconda environment.

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260776#comment-16260776
 ] 

ASF GitHub Bot commented on ARROW-1840:
---

ksdevlife opened a new pull request #1342: ARROW-1840: [Website] The 
installation command failed on Windows10 anaconda envir…
URL: https://github.com/apache/arrow/pull/1342
 
 
   …onment


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Website] The installation command failed on Windows10 anaconda environment.
> 
>
> Key: ARROW-1840
> URL: https://issues.apache.org/jira/browse/ARROW-1840
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Website
>Affects Versions: 0.7.1
> Environment: Windows10 and Anaconda
>Reporter: Kazuhiro Sodebayashi
>Priority: Minor
>  Labels: pull-request-available
>
> In https://arrow.apache.org/install/ ,  this command is failed my environment.
> {{conda install pyarrow==0.7.* -c conda-forge}}
> Output:
> {noformat}
> PackageNotFoundError: Packages missing in current channels:
>   - pyarrow ==0.7.*
> We have searched for the packages in the following channels:
>   - https://conda.anaconda.org/conda-forge/win-64
>   - https://conda.anaconda.org/conda-forge/noarch
>   - https://repo.continuum.io/pkgs/main/win-64
>   - https://repo.continuum.io/pkgs/main/noarch
>   - https://repo.continuum.io/pkgs/free/win-64
>   - https://repo.continuum.io/pkgs/free/noarch
>   - https://repo.continuum.io/pkgs/r/win-64
>   - https://repo.continuum.io/pkgs/r/noarch
>   - https://repo.continuum.io/pkgs/pro/win-64
>   - https://repo.continuum.io/pkgs/pro/noarch
>   - https://repo.continuum.io/pkgs/msys2/win-64
>   - https://repo.continuum.io/pkgs/msys2/noarch
> {noformat}
> If I input the follwing command, the installation is succeded.
> {{conda install pyarrow=0.7.* -c conda-forge}}
> The difference between them is "==" and "=".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (ARROW-1830) [Python] Error when loading all the files in a dictionary

2017-11-21 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-1830.

Resolution: Fixed

Issue resolved by pull request 1340
[https://github.com/apache/arrow/pull/1340]

> [Python] Error when loading all the files in a dictionary
> -
>
> Key: ARROW-1830
> URL: https://issues.apache.org/jira/browse/ARROW-1830
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.7.1
> Environment: Python 2.7.11 (default, Jan 22 2016, 08:29:18)  + 
> pyarrow 0.7.1
>Reporter: DB Tsai
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> I can read one parquet file, but when I tried to read all the parquet files 
> in a folder, I got an error.
> {code:java}
> >>> data = 
> >>> pq.ParquetDataset('./aaa/part-0-d8268e3a-4e65-41a3-a43e-01e0bf68ee86')
> >>> data = pq.ParquetDataset('./aaa/')
> Ignoring path: ./aaa//part-0-d8268e3a-4e65-41a3-a43e-01e0bf68ee86
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/usr/local/lib/python2.7/site-packages/pyarrow/parquet.py", line 638, 
> in __init__
> self.validate_schemas()
>   File "/usr/local/lib/python2.7/site-packages/pyarrow/parquet.py", line 647, 
> in validate_schemas
> self.schema = self.pieces[0].get_metadata(open_file).schema
> IndexError: list index out of range
> >>> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1830) [Python] Error when loading all the files in a dictionary

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260724#comment-16260724
 ] 

ASF GitHub Bot commented on ARROW-1830:
---

xhochy closed pull request #1340: ARROW-1830: [Python] Relax restriction that 
Parquet files in a dataset end in .parq or .parquet
URL: https://github.com/apache/arrow/pull/1340
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/python/pyarrow/parquet.py b/python/pyarrow/parquet.py
index 3023e1771..37da66280 100644
--- a/python/pyarrow/parquet.py
+++ b/python/pyarrow/parquet.py
@@ -421,10 +421,6 @@ def read(self, columns=None, nthreads=1, partitions=None,
 return table
 
 
-def _is_parquet_file(path):
-return path.endswith('parq') or path.endswith('parquet')
-
-
 class PartitionSet(object):
 """A data structure for cataloguing the observed Parquet partitions at a
 particular level. So if we have
@@ -556,14 +552,14 @@ def _visit_level(self, level, base_path, part_keys):
 filtered_files = []
 for path in files:
 full_path = self.pathsep.join((base_path, path))
-if _is_parquet_file(path):
-filtered_files.append(full_path)
-elif path.endswith('_common_metadata'):
+if path.endswith('_common_metadata'):
 self.common_metadata_path = full_path
 elif path.endswith('_metadata'):
 self.metadata_path = full_path
-elif not self._should_silently_exclude(path):
+elif self._should_silently_exclude(path):
 print('Ignoring path: {0}'.format(full_path))
+else:
+filtered_files.append(full_path)
 
 # ARROW-1079: Filter out "private" directories starting with underscore
 filtered_directories = [self.pathsep.join((base_path, x))
diff --git a/python/pyarrow/tests/test_parquet.py 
b/python/pyarrow/tests/test_parquet.py
index 522815fce..274ff458f 100644
--- a/python/pyarrow/tests/test_parquet.py
+++ b/python/pyarrow/tests/test_parquet.py
@@ -1020,7 +1020,7 @@ def _visit_level(base_dir, level, part_keys):
 
 if level == DEPTH - 1:
 # Generate example data
-file_path = pjoin(level_dir, 'data.parq')
+file_path = pjoin(level_dir, guid())
 
 filtered_df = _filter_partition(df, this_part_keys)
 part_table = pa.Table.from_pandas(filtered_df)


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Error when loading all the files in a dictionary
> -
>
> Key: ARROW-1830
> URL: https://issues.apache.org/jira/browse/ARROW-1830
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.7.1
> Environment: Python 2.7.11 (default, Jan 22 2016, 08:29:18)  + 
> pyarrow 0.7.1
>Reporter: DB Tsai
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> I can read one parquet file, but when I tried to read all the parquet files 
> in a folder, I got an error.
> {code:java}
> >>> data = 
> >>> pq.ParquetDataset('./aaa/part-0-d8268e3a-4e65-41a3-a43e-01e0bf68ee86')
> >>> data = pq.ParquetDataset('./aaa/')
> Ignoring path: ./aaa//part-0-d8268e3a-4e65-41a3-a43e-01e0bf68ee86
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/usr/local/lib/python2.7/site-packages/pyarrow/parquet.py", line 638, 
> in __init__
> self.validate_schemas()
>   File "/usr/local/lib/python2.7/site-packages/pyarrow/parquet.py", line 647, 
> in validate_schemas
> self.schema = self.pieces[0].get_metadata(open_file).schema
> IndexError: list index out of range
> >>> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1840) [Website] The installation command failed on Windows10 anaconda environment.

2017-11-21 Thread Kazuhiro Sodebayashi (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260653#comment-16260653
 ] 

Kazuhiro Sodebayashi commented on ARROW-1840:
-

I will make pullrequest myself.

> [Website] The installation command failed on Windows10 anaconda environment.
> 
>
> Key: ARROW-1840
> URL: https://issues.apache.org/jira/browse/ARROW-1840
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Website
>Affects Versions: 0.7.1
> Environment: Windows10 and Anaconda
>Reporter: Kazuhiro Sodebayashi
>Priority: Minor
>
> In https://arrow.apache.org/install/ ,  this command is failed my environment.
> {{conda install pyarrow==0.7.* -c conda-forge}}
> Output:
> {noformat}
> PackageNotFoundError: Packages missing in current channels:
>   - pyarrow ==0.7.*
> We have searched for the packages in the following channels:
>   - https://conda.anaconda.org/conda-forge/win-64
>   - https://conda.anaconda.org/conda-forge/noarch
>   - https://repo.continuum.io/pkgs/main/win-64
>   - https://repo.continuum.io/pkgs/main/noarch
>   - https://repo.continuum.io/pkgs/free/win-64
>   - https://repo.continuum.io/pkgs/free/noarch
>   - https://repo.continuum.io/pkgs/r/win-64
>   - https://repo.continuum.io/pkgs/r/noarch
>   - https://repo.continuum.io/pkgs/pro/win-64
>   - https://repo.continuum.io/pkgs/pro/noarch
>   - https://repo.continuum.io/pkgs/msys2/win-64
>   - https://repo.continuum.io/pkgs/msys2/noarch
> {noformat}
> If I input the follwing command, the installation is succeded.
> {{conda install pyarrow=0.7.* -c conda-forge}}
> The difference between them is "==" and "=".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1836) [C++] Fix C4996 warning from arrow/util/variant.h on MSVC builds

2017-11-21 Thread Max Risuhin (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260649#comment-16260649
 ] 

Max Risuhin commented on ARROW-1836:


[~wesmckinn] Could you please say, do you have an idea if it will be fine just 
to completely remove definition of deprecated `static_visitor` struct from 
[here|https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/variant.h#L105]
 ? Arrow compiles fine if it's removed.
If we need to keep deprecated constraints for compatibility reasons, it will 
need to deprecated C4996 with pragmas directly in source code, since it's just 
a warning that something deprecated was found by compiler.

> [C++] Fix C4996 warning from arrow/util/variant.h on MSVC builds
> 
>
> Key: ARROW-1836
> URL: https://issues.apache.org/jira/browse/ARROW-1836
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Max Risuhin
> Fix For: 0.8.0
>
>
> [~Max Risuhin] can you look into this? This is leaking into downstream users 
> of Arrow. see e.g. 
> https://github.com/apache/parquet-cpp/pull/403/commits/8e40b7d7d8f161a14dfed70cb6d528e82ffa21a9
>  and build failures 
> https://ci.appveyor.com/project/ApacheSoftwareFoundation/parquet-cpp/build/1.0.443



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (ARROW-1836) [C++] Fix C4996 warning from arrow/util/variant.h on MSVC builds

2017-11-21 Thread Max Risuhin (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260649#comment-16260649
 ] 

Max Risuhin edited comment on ARROW-1836 at 11/21/17 12:28 PM:
---

[~wesmckinn] Could you please say, do you have an idea if it will be fine just 
to completely remove definition of deprecated `static_visitor` struct from 
[here|https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/variant.h#L105]
 ? Arrow compiles fine if it's removed.
If we need to keep deprecated constraints for compatibility reasons, it will 
need to deprecated C4996 with pragmas directly in source code, since it's just 
a warning about something deprecated was found by compiler.


was (Author: max risuhin):
[~wesmckinn] Could you please say, do you have an idea if it will be fine just 
to completely remove definition of deprecated `static_visitor` struct from 
[here|https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/variant.h#L105]
 ? Arrow compiles fine if it's removed.
If we need to keep deprecated constraints for compatibility reasons, it will 
need to deprecated C4996 with pragmas directly in source code, since it's just 
a warning that something deprecated was found by compiler.

> [C++] Fix C4996 warning from arrow/util/variant.h on MSVC builds
> 
>
> Key: ARROW-1836
> URL: https://issues.apache.org/jira/browse/ARROW-1836
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Max Risuhin
> Fix For: 0.8.0
>
>
> [~Max Risuhin] can you look into this? This is leaking into downstream users 
> of Arrow. see e.g. 
> https://github.com/apache/parquet-cpp/pull/403/commits/8e40b7d7d8f161a14dfed70cb6d528e82ffa21a9
>  and build failures 
> https://ci.appveyor.com/project/ApacheSoftwareFoundation/parquet-cpp/build/1.0.443



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1840) [Website] The installation command failed on Windows10 anaconda environment.

2017-11-21 Thread Kazuhiro Sodebayashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuhiro Sodebayashi updated ARROW-1840:

Description: 
In https://arrow.apache.org/install/ ,  this command is failed my environment.
{{conda install pyarrow==0.7.* -c conda-forge}}

Output:

{noformat}
PackageNotFoundError: Packages missing in current channels:

  - pyarrow ==0.7.*

We have searched for the packages in the following channels:

  - https://conda.anaconda.org/conda-forge/win-64
  - https://conda.anaconda.org/conda-forge/noarch
  - https://repo.continuum.io/pkgs/main/win-64
  - https://repo.continuum.io/pkgs/main/noarch
  - https://repo.continuum.io/pkgs/free/win-64
  - https://repo.continuum.io/pkgs/free/noarch
  - https://repo.continuum.io/pkgs/r/win-64
  - https://repo.continuum.io/pkgs/r/noarch
  - https://repo.continuum.io/pkgs/pro/win-64
  - https://repo.continuum.io/pkgs/pro/noarch
  - https://repo.continuum.io/pkgs/msys2/win-64
  - https://repo.continuum.io/pkgs/msys2/noarch

{noformat}

If I input the follwing command, the installation is succeded.
{{conda install pyarrow=0.7.* -c conda-forge}}

The difference between them is "==" and "=".


  was:
In https://arrow.apache.org/install/ ,  command is failed my environment.
{{conda install pyarrow==0.7.* -c conda-forge}}

Output:

{noformat}
PackageNotFoundError: Packages missing in current channels:

  - pyarrow ==0.7.*

We have searched for the packages in the following channels:

  - https://conda.anaconda.org/conda-forge/win-64
  - https://conda.anaconda.org/conda-forge/noarch
  - https://repo.continuum.io/pkgs/main/win-64
  - https://repo.continuum.io/pkgs/main/noarch
  - https://repo.continuum.io/pkgs/free/win-64
  - https://repo.continuum.io/pkgs/free/noarch
  - https://repo.continuum.io/pkgs/r/win-64
  - https://repo.continuum.io/pkgs/r/noarch
  - https://repo.continuum.io/pkgs/pro/win-64
  - https://repo.continuum.io/pkgs/pro/noarch
  - https://repo.continuum.io/pkgs/msys2/win-64
  - https://repo.continuum.io/pkgs/msys2/noarch

{noformat}

If I input the follwing command, the installation is succeded.
{{conda install pyarrow=0.7.* -c conda-forge}}

The difference between them is "==" and "=".


Summary: [Website] The installation command failed on Windows10 
anaconda environment.  (was: [Website] The instration command failed on 
Windows10 anaconda environment.)

> [Website] The installation command failed on Windows10 anaconda environment.
> 
>
> Key: ARROW-1840
> URL: https://issues.apache.org/jira/browse/ARROW-1840
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Website
>Affects Versions: 0.7.1
> Environment: Windows10 and Anaconda
>Reporter: Kazuhiro Sodebayashi
>Priority: Minor
>
> In https://arrow.apache.org/install/ ,  this command is failed my environment.
> {{conda install pyarrow==0.7.* -c conda-forge}}
> Output:
> {noformat}
> PackageNotFoundError: Packages missing in current channels:
>   - pyarrow ==0.7.*
> We have searched for the packages in the following channels:
>   - https://conda.anaconda.org/conda-forge/win-64
>   - https://conda.anaconda.org/conda-forge/noarch
>   - https://repo.continuum.io/pkgs/main/win-64
>   - https://repo.continuum.io/pkgs/main/noarch
>   - https://repo.continuum.io/pkgs/free/win-64
>   - https://repo.continuum.io/pkgs/free/noarch
>   - https://repo.continuum.io/pkgs/r/win-64
>   - https://repo.continuum.io/pkgs/r/noarch
>   - https://repo.continuum.io/pkgs/pro/win-64
>   - https://repo.continuum.io/pkgs/pro/noarch
>   - https://repo.continuum.io/pkgs/msys2/win-64
>   - https://repo.continuum.io/pkgs/msys2/noarch
> {noformat}
> If I input the follwing command, the installation is succeded.
> {{conda install pyarrow=0.7.* -c conda-forge}}
> The difference between them is "==" and "=".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1840) [Website] The instration command failed on Windows10 anaconda environment.

2017-11-21 Thread Kazuhiro Sodebayashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuhiro Sodebayashi updated ARROW-1840:

Description: 
In https://arrow.apache.org/install/ ,  command is failed my environment.
{{conda install pyarrow==0.7.* -c conda-forge}}

Output:

{noformat}
PackageNotFoundError: Packages missing in current channels:

  - pyarrow ==0.7.*

We have searched for the packages in the following channels:

  - https://conda.anaconda.org/conda-forge/win-64
  - https://conda.anaconda.org/conda-forge/noarch
  - https://repo.continuum.io/pkgs/main/win-64
  - https://repo.continuum.io/pkgs/main/noarch
  - https://repo.continuum.io/pkgs/free/win-64
  - https://repo.continuum.io/pkgs/free/noarch
  - https://repo.continuum.io/pkgs/r/win-64
  - https://repo.continuum.io/pkgs/r/noarch
  - https://repo.continuum.io/pkgs/pro/win-64
  - https://repo.continuum.io/pkgs/pro/noarch
  - https://repo.continuum.io/pkgs/msys2/win-64
  - https://repo.continuum.io/pkgs/msys2/noarch

{noformat}

If I input the follwing command, the installation is succeded.
{{conda install pyarrow=0.7.* -c conda-forge}}

The difference between them is "==" and "=".


  was:
In https://arrow.apache.org/install/ ,  commend is failed my environment.
{{conda install pyarrow==0.7.* -c conda-forge}}

Output:

{noformat}
PackageNotFoundError: Packages missing in current channels:

  - pyarrow ==0.7.*

We have searched for the packages in the following channels:

  - https://conda.anaconda.org/conda-forge/win-64
  - https://conda.anaconda.org/conda-forge/noarch
  - https://repo.continuum.io/pkgs/main/win-64
  - https://repo.continuum.io/pkgs/main/noarch
  - https://repo.continuum.io/pkgs/free/win-64
  - https://repo.continuum.io/pkgs/free/noarch
  - https://repo.continuum.io/pkgs/r/win-64
  - https://repo.continuum.io/pkgs/r/noarch
  - https://repo.continuum.io/pkgs/pro/win-64
  - https://repo.continuum.io/pkgs/pro/noarch
  - https://repo.continuum.io/pkgs/msys2/win-64
  - https://repo.continuum.io/pkgs/msys2/noarch

{noformat}

If I input the follwing command, the instration is succeded.
{{conda install pyarrow=0.7.* -c conda-forge}}

The difference between them is "==" and "=".



> [Website] The instration command failed on Windows10 anaconda environment.
> --
>
> Key: ARROW-1840
> URL: https://issues.apache.org/jira/browse/ARROW-1840
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Website
>Affects Versions: 0.7.1
> Environment: Windows10 and Anaconda
>Reporter: Kazuhiro Sodebayashi
>Priority: Minor
>
> In https://arrow.apache.org/install/ ,  command is failed my environment.
> {{conda install pyarrow==0.7.* -c conda-forge}}
> Output:
> {noformat}
> PackageNotFoundError: Packages missing in current channels:
>   - pyarrow ==0.7.*
> We have searched for the packages in the following channels:
>   - https://conda.anaconda.org/conda-forge/win-64
>   - https://conda.anaconda.org/conda-forge/noarch
>   - https://repo.continuum.io/pkgs/main/win-64
>   - https://repo.continuum.io/pkgs/main/noarch
>   - https://repo.continuum.io/pkgs/free/win-64
>   - https://repo.continuum.io/pkgs/free/noarch
>   - https://repo.continuum.io/pkgs/r/win-64
>   - https://repo.continuum.io/pkgs/r/noarch
>   - https://repo.continuum.io/pkgs/pro/win-64
>   - https://repo.continuum.io/pkgs/pro/noarch
>   - https://repo.continuum.io/pkgs/msys2/win-64
>   - https://repo.continuum.io/pkgs/msys2/noarch
> {noformat}
> If I input the follwing command, the installation is succeded.
> {{conda install pyarrow=0.7.* -c conda-forge}}
> The difference between them is "==" and "=".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1840) [Website] The instration command failed on Windows10 anaconda environment.

2017-11-21 Thread Kazuhiro Sodebayashi (JIRA)
Kazuhiro Sodebayashi created ARROW-1840:
---

 Summary: [Website] The instration command failed on Windows10 
anaconda environment.
 Key: ARROW-1840
 URL: https://issues.apache.org/jira/browse/ARROW-1840
 Project: Apache Arrow
  Issue Type: Bug
  Components: Website
Affects Versions: 0.7.1
 Environment: Windows10 and Anaconda
Reporter: Kazuhiro Sodebayashi
Priority: Minor


In https://arrow.apache.org/install/ ,  commend is failed my environment.
{{conda install pyarrow==0.7.* -c conda-forge}}

Output:

{noformat}
PackageNotFoundError: Packages missing in current channels:

  - pyarrow ==0.7.*

We have searched for the packages in the following channels:

  - https://conda.anaconda.org/conda-forge/win-64
  - https://conda.anaconda.org/conda-forge/noarch
  - https://repo.continuum.io/pkgs/main/win-64
  - https://repo.continuum.io/pkgs/main/noarch
  - https://repo.continuum.io/pkgs/free/win-64
  - https://repo.continuum.io/pkgs/free/noarch
  - https://repo.continuum.io/pkgs/r/win-64
  - https://repo.continuum.io/pkgs/r/noarch
  - https://repo.continuum.io/pkgs/pro/win-64
  - https://repo.continuum.io/pkgs/pro/noarch
  - https://repo.continuum.io/pkgs/msys2/win-64
  - https://repo.continuum.io/pkgs/msys2/noarch

{noformat}

If I input the follwing command, the instration is succeded.
{{conda install pyarrow=0.7.* -c conda-forge}}

The difference between them is "==" and "=".




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)