[jira] [Commented] (ARROW-1725) [Packaging] Upload .deb for Ubuntu 17.10

2017-10-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16218133#comment-16218133
 ] 

ASF GitHub Bot commented on ARROW-1725:
---

kou commented on issue #17: ARROW-1725: Upload .deb for Ubuntu 17.10
URL: https://github.com/apache/arrow-dist/pull/17#issuecomment-339220628
 
 
   +1
   
   We can't test upload is succeeded with pull request. We need to use master 
branch.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Packaging] Upload .deb for Ubuntu 17.10
> 
>
> Key: ARROW-1725
> URL: https://issues.apache.org/jira/browse/ARROW-1725
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1725) [Packaging] Upload .deb for Ubuntu 17.10

2017-10-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16218135#comment-16218135
 ] 

ASF GitHub Bot commented on ARROW-1725:
---

kou closed pull request #17: ARROW-1725: Upload .deb for Ubuntu 17.10
URL: https://github.com/apache/arrow-dist/pull/17
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/.travis.yml b/.travis.yml
index 048d464..098afe6 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -54,7 +54,7 @@ matrix:
 before_install:
 install:
 script:
-- (cd cpp-linux && rake apt:build DEBUG=no)
+- (cd cpp-linux && travis_wait 40 rake apt:build PARALLEL=yes DEBUG=no)
 deploy:
   provider: bintray
   file: cpp-linux/apt/descriptor.json
diff --git a/cpp-linux/apt/descriptor.json b/cpp-linux/apt/descriptor.json
index fde23e4..9e1ff7a 100644
--- a/cpp-linux/apt/descriptor.json
+++ b/cpp-linux/apt/descriptor.json
@@ -39,6 +39,16 @@
 "deb_architecture": "amd64",
 "override": 1
 }
+},
+{
+"includePattern": 
"cpp-linux/apt/repositories/([^/]+)/pool/artful/universe/a/apache-arrow/([^/]+\\.deb)\\z",
+"uploadPattern": "pool/artful/universe/$2",
+"matrixParams": {
+"deb_distribution": "artful",
+"deb_component": "universe",
+"deb_architecture": "amd64",
+"override": 1
+}
 }
 ],
 "publish": true


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Packaging] Upload .deb for Ubuntu 17.10
> 
>
> Key: ARROW-1725
> URL: https://issues.apache.org/jira/browse/ARROW-1725
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1484) [C++] Implement (safe and unsafe) casts between timestamps and times of different units

2017-10-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16218062#comment-16218062
 ] 

ASF GitHub Bot commented on ARROW-1484:
---

wesm opened a new pull request #1245: ARROW-1484: [C++/Python] Implement casts 
between date, time, timestamp units
URL: https://github.com/apache/arrow/pull/1245
 
 
   Several JIRAs here that made sense to tackle together:
   
   * ARROW-1680
   * ARROW-1482
   * ARROW-1484
   * ARROW-1524
   
   This also fixes bugs relating to ignoring the offset in sliced arrays in 
some of the cast kernel implementations.
   
   cc @BryanCutler @xhochy @cpcloud 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Implement (safe and unsafe) casts between timestamps and times of 
> different units
> ---
>
> Key: ARROW-1484
> URL: https://issues.apache.org/jira/browse/ARROW-1484
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1484) [C++] Implement (safe and unsafe) casts between timestamps and times of different units

2017-10-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-1484:
--
Labels: pull-request-available  (was: )

> [C++] Implement (safe and unsafe) casts between timestamps and times of 
> different units
> ---
>
> Key: ARROW-1484
> URL: https://issues.apache.org/jira/browse/ARROW-1484
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1725) [Packaging] Upload .deb for Ubuntu 17.10

2017-10-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-1725:
--
Labels: pull-request-available  (was: )

> [Packaging] Upload .deb for Ubuntu 17.10
> 
>
> Key: ARROW-1725
> URL: https://issues.apache.org/jira/browse/ARROW-1725
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1725) [Packaging] Upload .deb for Ubuntu 17.10

2017-10-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16218044#comment-16218044
 ] 

ASF GitHub Bot commented on ARROW-1725:
---

kou opened a new pull request #17: ARROW-1725: Upload .deb for Ubuntu 17.10
URL: https://github.com/apache/arrow-dist/pull/17
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Packaging] Upload .deb for Ubuntu 17.10
> 
>
> Key: ARROW-1725
> URL: https://issues.apache.org/jira/browse/ARROW-1725
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (ARROW-1724) [Packaging] Support Ubuntu 17.10

2017-10-24 Thread Kouhei Sutou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou resolved ARROW-1724.
-
Resolution: Fixed

> [Packaging] Support Ubuntu 17.10
> 
>
> Key: ARROW-1724
> URL: https://issues.apache.org/jira/browse/ARROW-1724
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1725) [Packaging] Upload .deb for Ubuntu 17.10

2017-10-24 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-1725:
---

 Summary: [Packaging] Upload .deb for Ubuntu 17.10
 Key: ARROW-1725
 URL: https://issues.apache.org/jira/browse/ARROW-1725
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou
 Fix For: 0.8.0






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1724) [Packaging] Support Ubuntu 17.10

2017-10-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-1724:
--
Labels: pull-request-available  (was: )

> [Packaging] Support Ubuntu 17.10
> 
>
> Key: ARROW-1724
> URL: https://issues.apache.org/jira/browse/ARROW-1724
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1134) [C++] Allow C++/CLI projects to build with Arrow​

2017-10-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217967#comment-16217967
 ] 

ASF GitHub Bot commented on ARROW-1134:
---

cpcloud commented on issue #1228: ARROW-1134: [C++] Support for C++/CLI 
compilation, add NULLPTR define to avoid using nullptr in public headers
URL: https://github.com/apache/arrow/pull/1228#issuecomment-339185435
 
 
   +1 LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Allow C++/CLI projects to build with Arrow​
> -
>
> Key: ARROW-1134
> URL: https://issues.apache.org/jira/browse/ARROW-1134
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Toby Shaw
>Assignee: Wes McKinney
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Currently, the inclusion of  in some of Arrow's C++ headers prevents 
> C++/CLI code from building against it.
> From a C++/CLI project:
> #include 
> ...
> "#error directive:  is not supported when compiling with /clr or 
> /clr:pure."
> This could be patched by optionally relying on Boost's mutex/lock_guard 
> instead of std, or not exposing the #include  publically.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1588) [C++/Format] Harden Decimal Format

2017-10-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217954#comment-16217954
 ] 

ASF GitHub Bot commented on ARROW-1588:
---

jacques-n commented on issue #1211: ARROW-1588: [C++/Format] Harden Decimal 
Format
URL: https://github.com/apache/arrow/pull/1211#issuecomment-339182615
 
 
   LGTM +1


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++/Format] Harden Decimal Format
> --
>
> Key: ARROW-1588
> URL: https://issues.apache.org/jira/browse/ARROW-1588
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Format
>Affects Versions: 0.7.0
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> We should finalize and harden the decimal format. The remaining issues are 
> officially writing down the choice of making every decimal value 16 bytes and 
> byte order.
> For byte order we'll need to run some benchmarks to compare little endian vs 
> big endian. I plan to work on this over the next week or two.
> [~jacq...@dremio.com] [~wesmckinn] If there are any additional items you'd 
> like to see addressed here please chime in. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1724) [Packaging] Support Ubuntu 17.10

2017-10-24 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-1724:
---

 Summary: [Packaging] Support Ubuntu 17.10
 Key: ARROW-1724
 URL: https://issues.apache.org/jira/browse/ARROW-1724
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou
 Fix For: 0.8.0






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-473) [C++/Python] Add public API for retrieving block locations for a particular HDFS file

2017-10-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217921#comment-16217921
 ] 

ASF GitHub Bot commented on ARROW-473:
--

wesm commented on issue #1031: WIP ARROW-473: [C++/Python] Add public API for 
retrieving block locations for a particular HDFS file
URL: https://github.com/apache/arrow/pull/1031#issuecomment-339175713
 
 
   Hm nope that’s the one


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++/Python] Add public API for retrieving block locations for a particular 
> HDFS file
> -
>
> Key: ARROW-473
> URL: https://issues.apache.org/jira/browse/ARROW-473
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Python
>Reporter: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> This is necessary for applications looking to schedule data-local work. 
> libhdfs does not have APIs to request the block locations directly, so we 
> need to see if the {{hdfsGetHosts}} function will do what we need. For 
> libhdfs3 there is a public API function 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1716) [Format/JSON] Use string integer value for Decimals in JSON

2017-10-24 Thread Phillip Cloud (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217918#comment-16217918
 ] 

Phillip Cloud commented on ARROW-1716:
--

Yep. The JSON will contain the unscaled value.

> [Format/JSON] Use string integer value for Decimals in JSON
> ---
>
> Key: ARROW-1716
> URL: https://issues.apache.org/jira/browse/ARROW-1716
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Java - Vectors
>Affects Versions: 0.7.1
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
> Fix For: 0.8.0
>
>
> Suprisingly, Java and C++ integration tests pass after ARROW-1588. This hides 
> a bug, because we're writing decimal values as hex encoded bytes.
> C++ and Java compare that the bytes are the same, but because C++ is 
> interpreting everything as little endian after ARROW-1588 and Java is big 
> endian the numbers these bytes represent will be different in their 
> respective systems.
> I propose that instead of encoding DecimaArray/DecimalVector values as hex 
> encoded bytes, we store the integer as a string when writing Arrow 
> DecimalArray/DecimalVector data to JSON. This will allow us to compare that 
> the bytes have the same meaning in both systems.
> This requires a change to the way Arrow writes JSON.
> [~icexelloss] was extremely helpful in helping me get to the bottom of this.
> cc [~icexelloss] [~wesmckinn] [~jnadeau]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-473) [C++/Python] Add public API for retrieving block locations for a particular HDFS file

2017-10-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217914#comment-16217914
 ] 

ASF GitHub Bot commented on ARROW-473:
--

cpcloud commented on issue #1031: WIP ARROW-473: [C++/Python] Add public API 
for retrieving block locations for a particular HDFS file
URL: https://github.com/apache/arrow/pull/1031#issuecomment-339175005
 
 
   @wesm I'm using Ubuntu 14.04 (see 
https://github.com/cpcloud/docker-impala/blob/master/Dockerfile#L8). Should I 
be using something else?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++/Python] Add public API for retrieving block locations for a particular 
> HDFS file
> -
>
> Key: ARROW-473
> URL: https://issues.apache.org/jira/browse/ARROW-473
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Python
>Reporter: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> This is necessary for applications looking to schedule data-local work. 
> libhdfs does not have APIs to request the block locations directly, so we 
> need to see if the {{hdfsGetHosts}} function will do what we need. For 
> libhdfs3 there is a public API function 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1484) [C++] Implement (safe and unsafe) casts between timestamps and times of different units

2017-10-24 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1484:

Summary: [C++] Implement (safe and unsafe) casts between timestamps and 
times of different units  (was: [C++] Implement (safe and unsafe) casts between 
timestamps of different units)

> [C++] Implement (safe and unsafe) casts between timestamps and times of 
> different units
> ---
>
> Key: ARROW-1484
> URL: https://issues.apache.org/jira/browse/ARROW-1484
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (ARROW-1482) [C++] Implement casts between date32 and date64

2017-10-24 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-1482:
---

Assignee: Wes McKinney

> [C++] Implement casts between date32 and date64
> ---
>
> Key: ARROW-1482
> URL: https://issues.apache.org/jira/browse/ARROW-1482
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1588) [C++/Format] Harden Decimal Format

2017-10-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217865#comment-16217865
 ] 

ASF GitHub Bot commented on ARROW-1588:
---

wesm commented on issue #1211: ARROW-1588: [C++/Format] Harden Decimal Format
URL: https://github.com/apache/arrow/pull/1211#issuecomment-339165470
 
 
   I think we should merge this and then address the integration test issue and 
Java fixes in follow up patches. @jacques-n or @siddharthteotia can you please 
review?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++/Format] Harden Decimal Format
> --
>
> Key: ARROW-1588
> URL: https://issues.apache.org/jira/browse/ARROW-1588
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Format
>Affects Versions: 0.7.0
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> We should finalize and harden the decimal format. The remaining issues are 
> officially writing down the choice of making every decimal value 16 bytes and 
> byte order.
> For byte order we'll need to run some benchmarks to compare little endian vs 
> big endian. I plan to work on this over the next week or two.
> [~jacq...@dremio.com] [~wesmckinn] If there are any additional items you'd 
> like to see addressed here please chime in. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-473) [C++/Python] Add public API for retrieving block locations for a particular HDFS file

2017-10-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217848#comment-16217848
 ] 

ASF GitHub Bot commented on ARROW-473:
--

wesm commented on issue #1031: WIP ARROW-473: [C++/Python] Add public API for 
retrieving block locations for a particular HDFS file
URL: https://github.com/apache/arrow/pull/1031#issuecomment-339161850
 
 
   @cpcloud what is the base linux image you're using? I was fooled by fPIC 
errors that had to do with gcc5 ABI (see 
https://github.com/conda-forge/boost-cpp-feedstock/blob/master/recipe/build.sh#L17,
 it seems to be there)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++/Python] Add public API for retrieving block locations for a particular 
> HDFS file
> -
>
> Key: ARROW-473
> URL: https://issues.apache.org/jira/browse/ARROW-473
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Python
>Reporter: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> This is necessary for applications looking to schedule data-local work. 
> libhdfs does not have APIs to request the block locations directly, so we 
> need to see if the {{hdfsGetHosts}} function will do what we need. For 
> libhdfs3 there is a public API function 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1409) [Format] Use for "page" attribute in Buffer in metadata

2017-10-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217843#comment-16217843
 ] 

ASF GitHub Bot commented on ARROW-1409:
---

BryanCutler commented on issue #1225: ARROW-1409: [Format] Remove page id from 
Buffer metadata, increment metadata version number
URL: https://github.com/apache/arrow/pull/1225#issuecomment-339160655
 
 
   +1


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Format] Use for "page" attribute in Buffer in metadata
> ---
>
> Key: ARROW-1409
> URL: https://issues.apache.org/jira/browse/ARROW-1409
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Format
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> This attribute is currently unused in any Arrow implementation. I think the 
> original idea is that the "page" might indicate a particular shared memory 
> page, so that a record batch could be spread across multiple memory regions.
> The downside of this unused attribute is that Buffer metadata takes 24 bytes 
> instead of 16 due to padding. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1134) [C++] Allow C++/CLI projects to build with Arrow​

2017-10-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217839#comment-16217839
 ] 

ASF GitHub Bot commented on ARROW-1134:
---

wesm commented on issue #1228: ARROW-1134: [C++] Support for C++/CLI 
compilation, add NULLPTR define to avoid using nullptr in public headers
URL: https://github.com/apache/arrow/pull/1228#issuecomment-339159755
 
 
   In the meantime we'll need to add a linting script so that this work does 
not get undone by a future patch


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Allow C++/CLI projects to build with Arrow​
> -
>
> Key: ARROW-1134
> URL: https://issues.apache.org/jira/browse/ARROW-1134
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Toby Shaw
>Assignee: Wes McKinney
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Currently, the inclusion of  in some of Arrow's C++ headers prevents 
> C++/CLI code from building against it.
> From a C++/CLI project:
> #include 
> ...
> "#error directive:  is not supported when compiling with /clr or 
> /clr:pure."
> This could be patched by optionally relying on Boost's mutex/lock_guard 
> instead of std, or not exposing the #include  publically.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1134) [C++] Allow C++/CLI projects to build with Arrow​

2017-10-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217838#comment-16217838
 ] 

ASF GitHub Bot commented on ARROW-1134:
---

wesm commented on issue #1228: ARROW-1134: [C++] Support for C++/CLI 
compilation, add NULLPTR define to avoid using nullptr in public headers
URL: https://github.com/apache/arrow/pull/1228#issuecomment-339159697
 
 
   @xhochy @cpcloud I would suggest we should merge this and wait for more 
feedback from C++/CLI users


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Allow C++/CLI projects to build with Arrow​
> -
>
> Key: ARROW-1134
> URL: https://issues.apache.org/jira/browse/ARROW-1134
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Toby Shaw
>Assignee: Wes McKinney
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Currently, the inclusion of  in some of Arrow's C++ headers prevents 
> C++/CLI code from building against it.
> From a C++/CLI project:
> #include 
> ...
> "#error directive:  is not supported when compiling with /clr or 
> /clr:pure."
> This could be patched by optionally relying on Boost's mutex/lock_guard 
> instead of std, or not exposing the #include  publically.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1716) [Format/JSON] Use string integer value for Decimals in JSON

2017-10-24 Thread Bryan Cutler (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217830#comment-16217830
 ] 

Bryan Cutler commented on ARROW-1716:
-

+1.  For reading values from JSON we can just use the {{scale}} from the schema 
right?

> [Format/JSON] Use string integer value for Decimals in JSON
> ---
>
> Key: ARROW-1716
> URL: https://issues.apache.org/jira/browse/ARROW-1716
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Java - Vectors
>Affects Versions: 0.7.1
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
> Fix For: 0.8.0
>
>
> Suprisingly, Java and C++ integration tests pass after ARROW-1588. This hides 
> a bug, because we're writing decimal values as hex encoded bytes.
> C++ and Java compare that the bytes are the same, but because C++ is 
> interpreting everything as little endian after ARROW-1588 and Java is big 
> endian the numbers these bytes represent will be different in their 
> respective systems.
> I propose that instead of encoding DecimaArray/DecimalVector values as hex 
> encoded bytes, we store the integer as a string when writing Arrow 
> DecimalArray/DecimalVector data to JSON. This will allow us to compare that 
> the bytes have the same meaning in both systems.
> This requires a change to the way Arrow writes JSON.
> [~icexelloss] was extremely helpful in helping me get to the bottom of this.
> cc [~icexelloss] [~wesmckinn] [~jnadeau]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-473) [C++/Python] Add public API for retrieving block locations for a particular HDFS file

2017-10-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217828#comment-16217828
 ] 

ASF GitHub Bot commented on ARROW-473:
--

cpcloud commented on issue #1031: WIP ARROW-473: [C++/Python] Add public API 
for retrieving block locations for a particular HDFS file
URL: https://github.com/apache/arrow/pull/1031#issuecomment-339157813
 
 
   @AnkitAggarwalPEC I don't think you waited long enough to see if you could 
get to the build stage before posting that error message (it takes about 2ish 
minutes). I'm able to get to the point where I can start to build arrow, which 
fails because it looks like the static boost libs from conda-forge weren't 
compiled with position independent code (`-fPIC`) enabled.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++/Python] Add public API for retrieving block locations for a particular 
> HDFS file
> -
>
> Key: ARROW-473
> URL: https://issues.apache.org/jira/browse/ARROW-473
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Python
>Reporter: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> This is necessary for applications looking to schedule data-local work. 
> libhdfs does not have APIs to request the block locations directly, so we 
> need to see if the {{hdfsGetHosts}} function will do what we need. For 
> libhdfs3 there is a public API function 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1689) [Python] Categorical Indices Should Be Zero-Copy

2017-10-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217816#comment-16217816
 ] 

ASF GitHub Bot commented on ARROW-1689:
---

njwhite commented on a change in pull request #1233: ARROW-1689: [Python] Allow 
user to request no data copies
URL: https://github.com/apache/arrow/pull/1233#discussion_r146709736
 
 

 ##
 File path: cpp/src/arrow/python/python-test.cc
 ##
 @@ -86,7 +86,7 @@ TEST(PandasConversionTest, TestObjectBlockWriteFails) {
 
   PyObject* out;
   Py_BEGIN_ALLOW_THREADS;
-  PandasOptions options;
+  PandasOptions options = {false, false};
 
 Review comment:
   Done!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Categorical Indices Should Be Zero-Copy
> 
>
> Key: ARROW-1689
> URL: https://issues.apache.org/jira/browse/ARROW-1689
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.7.1
>Reporter: Nick White
>  Labels: pull-request-available
>
> It seems like 
> [WriteIndices|https://github.com/apache/arrow/blob/0c8b861f93884f2868eb631d8fceee3a8b8905ec/cpp/src/arrow/python/arrow_to_pandas.cc#L955-L981]
>  could reuse some of the logic in 
> [ConvertValuesZeroCopy|https://github.com/apache/arrow/blob/0c8b861f93884f2868eb631d8fceee3a8b8905ec/cpp/src/arrow/python/arrow_to_pandas.cc#L1348-L1385]
>  to avoid copying the integer indices array?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1689) [Python] Categorical Indices Should Be Zero-Copy

2017-10-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217814#comment-16217814
 ] 

ASF GitHub Bot commented on ARROW-1689:
---

njwhite commented on a change in pull request #1233: ARROW-1689: [Python] Allow 
user to request no data copies
URL: https://github.com/apache/arrow/pull/1233#discussion_r146709719
 
 

 ##
 File path: cpp/src/arrow/status.h
 ##
 @@ -95,7 +95,8 @@ enum class StatusCode : char {
   PythonError = 12,
   PlasmaObjectExists = 20,
   PlasmaObjectNonexistent = 21,
-  PlasmaStoreFull = 22
+  PlasmaStoreFull = 22,
+  CopyRequired = 23
 
 Review comment:
   Done!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Categorical Indices Should Be Zero-Copy
> 
>
> Key: ARROW-1689
> URL: https://issues.apache.org/jira/browse/ARROW-1689
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.7.1
>Reporter: Nick White
>  Labels: pull-request-available
>
> It seems like 
> [WriteIndices|https://github.com/apache/arrow/blob/0c8b861f93884f2868eb631d8fceee3a8b8905ec/cpp/src/arrow/python/arrow_to_pandas.cc#L955-L981]
>  could reuse some of the logic in 
> [ConvertValuesZeroCopy|https://github.com/apache/arrow/blob/0c8b861f93884f2868eb631d8fceee3a8b8905ec/cpp/src/arrow/python/arrow_to_pandas.cc#L1348-L1385]
>  to avoid copying the integer indices array?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1689) [Python] Categorical Indices Should Be Zero-Copy

2017-10-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217813#comment-16217813
 ] 

ASF GitHub Bot commented on ARROW-1689:
---

njwhite commented on a change in pull request #1233: ARROW-1689: [Python] Allow 
user to request no data copies
URL: https://github.com/apache/arrow/pull/1233#discussion_r146709688
 
 

 ##
 File path: python/pyarrow/tests/test_convert_pandas.py
 ##
 @@ -212,6 +212,18 @@ def test_float_no_nulls(self):
 schema = pa.schema(fields)
 self._check_pandas_roundtrip(df, expected_schema=schema)
 
+def test_zero_copy_success(self):
+result = pa.array([0, 1, 2]).to_pandas(zero_copy_only=True)
+npt.assert_array_equal(result, [0, 1, 2])
+
+def test_zero_copy_failure_on_object_types(self):
+with self.assertRaises(pa.ArrowException):
+pa.array(['A', 'B', 'C']).to_pandas(zero_copy_only=True)
+
+def test_zero_copy_failure_when_nulls(self):
+with self.assertRaises(pa.ArrowException):
+pa.array([0, 1, None]).to_pandas(zero_copy_only=True)
 
 Review comment:
   Added!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Categorical Indices Should Be Zero-Copy
> 
>
> Key: ARROW-1689
> URL: https://issues.apache.org/jira/browse/ARROW-1689
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.7.1
>Reporter: Nick White
>  Labels: pull-request-available
>
> It seems like 
> [WriteIndices|https://github.com/apache/arrow/blob/0c8b861f93884f2868eb631d8fceee3a8b8905ec/cpp/src/arrow/python/arrow_to_pandas.cc#L955-L981]
>  could reuse some of the logic in 
> [ConvertValuesZeroCopy|https://github.com/apache/arrow/blob/0c8b861f93884f2868eb631d8fceee3a8b8905ec/cpp/src/arrow/python/arrow_to_pandas.cc#L1348-L1385]
>  to avoid copying the integer indices array?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1716) [Format/JSON] Use string integer value for Decimals in JSON

2017-10-24 Thread Phillip Cloud (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phillip Cloud updated ARROW-1716:
-
Issue Type: Improvement  (was: Bug)

> [Format/JSON] Use string integer value for Decimals in JSON
> ---
>
> Key: ARROW-1716
> URL: https://issues.apache.org/jira/browse/ARROW-1716
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Java - Vectors
>Affects Versions: 0.7.1
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
> Fix For: 0.8.0
>
>
> Suprisingly, Java and C++ integration tests pass after ARROW-1588. This hides 
> a bug, because we're writing decimal values as hex encoded bytes.
> C++ and Java compare that the bytes are the same, but because C++ is 
> interpreting everything as little endian after ARROW-1588 and Java is big 
> endian the numbers these bytes represent will be different in their 
> respective systems.
> I propose that instead of encoding DecimaArray/DecimalVector values as hex 
> encoded bytes, we store the integer as a string when writing Arrow 
> DecimalArray/DecimalVector data to JSON. This will allow us to compare that 
> the bytes have the same meaning in both systems.
> This requires a change to the way Arrow writes JSON.
> [~icexelloss] was extremely helpful in helping me get to the bottom of this.
> cc [~icexelloss] [~wesmckinn] [~jnadeau]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1723) Windows: __declspec(dllexport) specified when building arrow static library

2017-10-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217787#comment-16217787
 ] 

ASF GitHub Bot commented on ARROW-1723:
---

wesm commented on a change in pull request #1244: [ARROW-1723] add ARROW_STATIC 
to mark static libs on Windows
URL: https://github.com/apache/arrow/pull/1244#discussion_r146705359
 
 

 ##
 File path: cpp/cmake_modules/BuildUtils.cmake
 ##
 @@ -165,6 +165,8 @@ function(ADD_ARROW_LIB LIB_NAME)
   LIBRARY_OUTPUT_DIRECTORY "${BUILD_OUTPUT_ROOT_DIRECTORY}"
   OUTPUT_NAME ${LIB_NAME_STATIC})
 
+  target_compile_definitions(${LIB_NAME}_static PUBLIC ARROW_STATIC)
 
 Review comment:
   I'm not sure what this does. Does this only impact the link step for the 
static library? Are the .cc units compiled a single time or multiple times 
(once for static, once for shared)?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Windows: __declspec(dllexport) specified when building arrow static library
> ---
>
> Key: ARROW-1723
> URL: https://issues.apache.org/jira/browse/ARROW-1723
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: John Jenkins
>  Labels: pull-request-available
>
> As I understand it, dllexport/dllimport should be left out when building and 
> using static libraries on Windows. A PR will follow shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1689) [Python] Categorical Indices Should Be Zero-Copy

2017-10-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217785#comment-16217785
 ] 

ASF GitHub Bot commented on ARROW-1689:
---

njwhite commented on a change in pull request #1233: ARROW-1689: [Python] Allow 
user to request no data copies
URL: https://github.com/apache/arrow/pull/1233#discussion_r146705787
 
 

 ##
 File path: cpp/src/arrow/python/arrow_to_pandas.cc
 ##
 @@ -1542,26 +1566,19 @@ class ArrowDeserializer {
   }
 
   Status Visit(const DictionaryType& type) {
+if (options_.zero_copy_only) {
+  return Status::CopyRequired("DictionaryType needs copies");
+}
+
 auto block = std::make_shared(options_, nullptr, 
col_->length());
 RETURN_NOT_OK(block->Write(col_, 0, 0));
 
-auto dict_type = static_cast(col_->type().get());
-
 PyAcquireGIL lock;
 result_ = PyDict_New();
 RETURN_IF_PYERROR();
 
-PyObject* dictionary;
-
-// Release GIL before calling ConvertArrayToPandas, will be reacquired
-// there if needed
-lock.release();
-RETURN_NOT_OK(
-ConvertArrayToPandas(options_, dict_type->dictionary(), nullptr, 
));
-lock.acquire();
-
 
 Review comment:
   It's already been run by [this 
call](https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/arrow_to_pandas.cc#L1546)
 to Write 
[here](https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/arrow_to_pandas.cc#L1017-L1020)
 - so this change just reuses the save dictionary instead of building it again.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Categorical Indices Should Be Zero-Copy
> 
>
> Key: ARROW-1689
> URL: https://issues.apache.org/jira/browse/ARROW-1689
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.7.1
>Reporter: Nick White
>  Labels: pull-request-available
>
> It seems like 
> [WriteIndices|https://github.com/apache/arrow/blob/0c8b861f93884f2868eb631d8fceee3a8b8905ec/cpp/src/arrow/python/arrow_to_pandas.cc#L955-L981]
>  could reuse some of the logic in 
> [ConvertValuesZeroCopy|https://github.com/apache/arrow/blob/0c8b861f93884f2868eb631d8fceee3a8b8905ec/cpp/src/arrow/python/arrow_to_pandas.cc#L1348-L1385]
>  to avoid copying the integer indices array?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-473) [C++/Python] Add public API for retrieving block locations for a particular HDFS file

2017-10-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217779#comment-16217779
 ] 

ASF GitHub Bot commented on ARROW-473:
--

cpcloud commented on issue #1031: WIP ARROW-473: [C++/Python] Add public API 
for retrieving block locations for a particular HDFS file
URL: https://github.com/apache/arrow/pull/1031#issuecomment-339146866
 
 
   Ok, let me see what I can do. I'm going to pull this branch down and hack.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++/Python] Add public API for retrieving block locations for a particular 
> HDFS file
> -
>
> Key: ARROW-473
> URL: https://issues.apache.org/jira/browse/ARROW-473
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Python
>Reporter: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> This is necessary for applications looking to schedule data-local work. 
> libhdfs does not have APIs to request the block locations directly, so we 
> need to see if the {{hdfsGetHosts}} function will do what we need. For 
> libhdfs3 there is a public API function 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-473) [C++/Python] Add public API for retrieving block locations for a particular HDFS file

2017-10-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217768#comment-16217768
 ] 

ASF GitHub Bot commented on ARROW-473:
--

wesm commented on issue #1031: WIP ARROW-473: [C++/Python] Add public API for 
retrieving block locations for a particular HDFS file
URL: https://github.com/apache/arrow/pull/1031#issuecomment-339144852
 
 
   The script was written when the image still blocked. So that needs to be 
fixed 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++/Python] Add public API for retrieving block locations for a particular 
> HDFS file
> -
>
> Key: ARROW-473
> URL: https://issues.apache.org/jira/browse/ARROW-473
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Python
>Reporter: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> This is necessary for applications looking to schedule data-local work. 
> libhdfs does not have APIs to request the block locations directly, so we 
> need to see if the {{hdfsGetHosts}} function will do what we need. For 
> libhdfs3 there is a public API function 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-473) [C++/Python] Add public API for retrieving block locations for a particular HDFS file

2017-10-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217743#comment-16217743
 ] 

ASF GitHub Bot commented on ARROW-473:
--

cpcloud commented on issue #1031: WIP ARROW-473: [C++/Python] Add public API 
for retrieving block locations for a particular HDFS file
URL: https://github.com/apache/arrow/pull/1031#issuecomment-339140764
 
 
   It takes around 2 minutes to fully start up. Is the script waiting for at 
least that amount of time before doing anything?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++/Python] Add public API for retrieving block locations for a particular 
> HDFS file
> -
>
> Key: ARROW-473
> URL: https://issues.apache.org/jira/browse/ARROW-473
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Python
>Reporter: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> This is necessary for applications looking to schedule data-local work. 
> libhdfs does not have APIs to request the block locations directly, so we 
> need to see if the {{hdfsGetHosts}} function will do what we need. For 
> libhdfs3 there is a public API function 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1723) Windows: __declspec(dllexport) specified when building arrow static library

2017-10-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-1723:
--
Labels: pull-request-available  (was: )

> Windows: __declspec(dllexport) specified when building arrow static library
> ---
>
> Key: ARROW-1723
> URL: https://issues.apache.org/jira/browse/ARROW-1723
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: John Jenkins
>  Labels: pull-request-available
>
> As I understand it, dllexport/dllimport should be left out when building and 
> using static libraries on Windows. A PR will follow shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1723) Windows: __declspec(dllexport) specified when building arrow static library

2017-10-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217736#comment-16217736
 ] 

ASF GitHub Bot commented on ARROW-1723:
---

JohnPJenkins opened a new pull request #1244: [ARROW-1723] add ARROW_STATIC to 
mark static libs on Windows
URL: https://github.com/apache/arrow/pull/1244
 
 
   Add a preprocessor macro ARROW_STATIC when doing static library builds on 
Windows. Clients developing/building off the static library will also need to 
define this - please let me know how this should be documented, if this is an 
acceptable approach.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Windows: __declspec(dllexport) specified when building arrow static library
> ---
>
> Key: ARROW-1723
> URL: https://issues.apache.org/jira/browse/ARROW-1723
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: John Jenkins
>  Labels: pull-request-available
>
> As I understand it, dllexport/dllimport should be left out when building and 
> using static libraries on Windows. A PR will follow shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1723) Windows: __declspec(dllexport) specified when building arrow static library

2017-10-24 Thread John Jenkins (JIRA)
John Jenkins created ARROW-1723:
---

 Summary: Windows: __declspec(dllexport) specified when building 
arrow static library
 Key: ARROW-1723
 URL: https://issues.apache.org/jira/browse/ARROW-1723
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: John Jenkins


As I understand it, dllexport/dllimport should be left out when building and 
using static libraries on Windows. A PR will follow shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-473) [C++/Python] Add public API for retrieving block locations for a particular HDFS file

2017-10-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217604#comment-16217604
 ] 

ASF GitHub Bot commented on ARROW-473:
--

AnkitAggarwalPEC commented on issue #1031: WIP ARROW-473: [C++/Python] Add 
public API for retrieving block locations for a particular HDFS file
URL: https://github.com/apache/arrow/pull/1031#issuecomment-339115923
 
 
   @cpcloud @wesm 
   Not connected to Impala, could not execute queries.
   Starting Impala Shell without Kerberos authentication
   Error connecting: TTransportException, Could not connect to arrow-hdfs:21000
   Not connected to Impala, could not execute queries.
   Starting Impala Shell without Kerberos authentication
   Connected to arrow-hdfs:21000
   Server version: impalad version 2.9.0-cdh5.12.0 RELEASE (build 
03c6ddbdcec39238be4f5b14a300d5c4f576097e)
   Query: select VERSION()
   Query submitted at: 2017-10-24 20:08:08 (Coordinator: 
http://arrow-hdfs:25000)
   ERROR: AnalysisException: This Impala daemon is not ready to accept user 
requests. Status: Waiting for catalog update from the StateStore.
   
   Is something else is needed to be done other than "./test_hdfs.sh" ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++/Python] Add public API for retrieving block locations for a particular 
> HDFS file
> -
>
> Key: ARROW-473
> URL: https://issues.apache.org/jira/browse/ARROW-473
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Python
>Reporter: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> This is necessary for applications looking to schedule data-local work. 
> libhdfs does not have APIs to request the block locations directly, so we 
> need to see if the {{hdfsGetHosts}} function will do what we need. For 
> libhdfs3 there is a public API function 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-473) [C++/Python] Add public API for retrieving block locations for a particular HDFS file

2017-10-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217600#comment-16217600
 ] 

ASF GitHub Bot commented on ARROW-473:
--

AnkitAggarwalPEC commented on issue #1031: WIP ARROW-473: [C++/Python] Add 
public API for retrieving block locations for a particular HDFS file
URL: https://github.com/apache/arrow/pull/1031#issuecomment-339115923
 
 
   @cpcloud 
   Not connected to Impala, could not execute queries.
   Starting Impala Shell without Kerberos authentication
   Error connecting: TTransportException, Could not connect to arrow-hdfs:21000
   Not connected to Impala, could not execute queries.
   Starting Impala Shell without Kerberos authentication
   Connected to arrow-hdfs:21000
   Server version: impalad version 2.9.0-cdh5.12.0 RELEASE (build 
03c6ddbdcec39238be4f5b14a300d5c4f576097e)
   Query: select VERSION()
   Query submitted at: 2017-10-24 20:08:08 (Coordinator: 
http://arrow-hdfs:25000)
   ERROR: AnalysisException: This Impala daemon is not ready to accept user 
requests. Status: Waiting for catalog update from the StateStore.
   
   Is something else is needed to be done other than "./test_hdfs.sh" ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++/Python] Add public API for retrieving block locations for a particular 
> HDFS file
> -
>
> Key: ARROW-473
> URL: https://issues.apache.org/jira/browse/ARROW-473
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Python
>Reporter: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> This is necessary for applications looking to schedule data-local work. 
> libhdfs does not have APIs to request the block locations directly, so we 
> need to see if the {{hdfsGetHosts}} function will do what we need. For 
> libhdfs3 there is a public API function 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-473) [C++/Python] Add public API for retrieving block locations for a particular HDFS file

2017-10-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217599#comment-16217599
 ] 

ASF GitHub Bot commented on ARROW-473:
--

AnkitAggarwalPEC commented on issue #1031: WIP ARROW-473: [C++/Python] Add 
public API for retrieving block locations for a particular HDFS file
URL: https://github.com/apache/arrow/pull/1031#issuecomment-339115923
 
 
   @cpcloud 
   Not connected to Impala, could not execute queries.
   Starting Impala Shell without Kerberos authentication
   Error connecting: TTransportException, Could not connect to arrow-hdfs:21000
   Not connected to Impala, could not execute queries.
   Starting Impala Shell without Kerberos authentication
   Connected to arrow-hdfs:21000
   Server version: impalad version 2.9.0-cdh5.12.0 RELEASE (build 
03c6ddbdcec39238be4f5b14a300d5c4f576097e)
   Query: select VERSION()
   Query submitted at: 2017-10-24 20:08:08 (Coordinator: 
http://arrow-hdfs:25000)
   ERROR: AnalysisException: This Impala daemon is not ready to accept user 
requests. Status: Waiting for catalog update from the StateStore.
   
   Is something else is needed to be done other that "./test_hdfs.sh" ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++/Python] Add public API for retrieving block locations for a particular 
> HDFS file
> -
>
> Key: ARROW-473
> URL: https://issues.apache.org/jira/browse/ARROW-473
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Python
>Reporter: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> This is necessary for applications looking to schedule data-local work. 
> libhdfs does not have APIs to request the block locations directly, so we 
> need to see if the {{hdfsGetHosts}} function will do what we need. For 
> libhdfs3 there is a public API function 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy

2017-10-24 Thread Ethan Levine (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217560#comment-16217560
 ] 

Ethan Levine commented on ARROW-1710:
-

The BitVector is an extra object that has to be allocated (both in terms of the 
backing data and in terms of the Java objects involved). You'd also need to 
perform bit masking of the underlying data with every write, which could 
involve a cache miss if the data for the BitVector isn't neatly colocated with 
the actual data for the nullable vector.

Perhaps a tracking flag could be added to the nullable vectors, though. It 
would start out "false", and get set to "true" if you ever write a null value. 
That way you could avoid the extra allocation and computation involved with 
tracking the validity of each value in the case where there are no null values. 
This seems like it would be more complicated than just keeping non-nullable 
vectors around, however.

> [Java] Decide what to do with non-nullable vectors in new vector class 
> hierarchy 
> -
>
> Key: ARROW-1710
> URL: https://issues.apache.org/jira/browse/ARROW-1710
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java - Vectors
>Reporter: Li Jin
> Fix For: 0.8.0
>
>
> So far the consensus seems to be remove all non-nullable vectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-473) [C++/Python] Add public API for retrieving block locations for a particular HDFS file

2017-10-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217466#comment-16217466
 ] 

ASF GitHub Bot commented on ARROW-473:
--

cpcloud commented on issue #1031: WIP ARROW-473: [C++/Python] Add public API 
for retrieving block locations for a particular HDFS file
URL: https://github.com/apache/arrow/pull/1031#issuecomment-339095977
 
 
   @AnkitAggarwalPEC:
   
   ```sh
   docker pull cpcloud86/impala:java8-1
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++/Python] Add public API for retrieving block locations for a particular 
> HDFS file
> -
>
> Key: ARROW-473
> URL: https://issues.apache.org/jira/browse/ARROW-473
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Python
>Reporter: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> This is necessary for applications looking to schedule data-local work. 
> libhdfs does not have APIs to request the block locations directly, so we 
> need to see if the {{hdfsGetHosts}} function will do what we need. For 
> libhdfs3 there is a public API function 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-473) [C++/Python] Add public API for retrieving block locations for a particular HDFS file

2017-10-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217452#comment-16217452
 ] 

ASF GitHub Bot commented on ARROW-473:
--

AnkitAggarwalPEC commented on issue #1031: WIP ARROW-473: [C++/Python] Add 
public API for retrieving block locations for a particular HDFS file
URL: https://github.com/apache/arrow/pull/1031#issuecomment-339094756
 
 
   @wesm I'm really sorry for the late reply but somehow update by @cpcloud 
ended up in spam
   
   @cpcloud Can you please specify how tag is needed to be update to java8 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++/Python] Add public API for retrieving block locations for a particular 
> HDFS file
> -
>
> Key: ARROW-473
> URL: https://issues.apache.org/jira/browse/ARROW-473
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Python
>Reporter: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> This is necessary for applications looking to schedule data-local work. 
> libhdfs does not have APIs to request the block locations directly, so we 
> need to see if the {{hdfsGetHosts}} function will do what we need. For 
> libhdfs3 there is a public API function 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1524) [C++] More graceful solution for handling non-zero offsets on inputs and outputs in compute library

2017-10-24 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217428#comment-16217428
 ] 

Wes McKinney commented on ARROW-1524:
-

This is quite urgent as many of the cast kernel implementations do not properly 
account for the offset, and so if used on sliced arrays at present will yield 
incorrect results

> [C++] More graceful solution for handling non-zero offsets on inputs and 
> outputs in compute library
> ---
>
> Key: ARROW-1524
> URL: https://issues.apache.org/jira/browse/ARROW-1524
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
> Fix For: 0.8.0
>
>
> Currently we must remember to shift by the offset. We should add some inline 
> utility functions to centralize this logic.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-473) [C++/Python] Add public API for retrieving block locations for a particular HDFS file

2017-10-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-473:
-
Labels: pull-request-available  (was: )

> [C++/Python] Add public API for retrieving block locations for a particular 
> HDFS file
> -
>
> Key: ARROW-473
> URL: https://issues.apache.org/jira/browse/ARROW-473
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Python
>Reporter: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> This is necessary for applications looking to schedule data-local work. 
> libhdfs does not have APIs to request the block locations directly, so we 
> need to see if the {{hdfsGetHosts}} function will do what we need. For 
> libhdfs3 there is a public API function 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-473) [C++/Python] Add public API for retrieving block locations for a particular HDFS file

2017-10-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217399#comment-16217399
 ] 

ASF GitHub Bot commented on ARROW-473:
--

wesm commented on issue #1031: WIP ARROW-473: [C++/Python] Add public API for 
retrieving block locations for a particular HDFS file
URL: https://github.com/apache/arrow/pull/1031#issuecomment-339085286
 
 
   @AnkitAggarwalPEC pinging you on this


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++/Python] Add public API for retrieving block locations for a particular 
> HDFS file
> -
>
> Key: ARROW-473
> URL: https://issues.apache.org/jira/browse/ARROW-473
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Python
>Reporter: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> This is necessary for applications looking to schedule data-local work. 
> libhdfs does not have APIs to request the block locations directly, so we 
> need to see if the {{hdfsGetHosts}} function will do what we need. For 
> libhdfs3 there is a public API function 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)