[jira] [Commented] (ARROW-1795) [Plasma C++] change evict policy

2017-11-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258253#comment-16258253
 ] 

ASF GitHub Bot commented on ARROW-1795:
---

robertnishihara commented on issue #1327: ARROW-1795: [Plasma] Create flag to 
make Plasma store use a single memory-mapped file.
URL: https://github.com/apache/arrow/pull/1327#issuecomment-345478483
 
 
   Thanks for letting us know. We'll keep an eye out for it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Plasma C++] change evict policy
> 
>
> Key: ARROW-1795
> URL: https://issues.apache.org/jira/browse/ARROW-1795
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Reporter: Lu Qi 
>Assignee: Robert Nishihara
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> case 1.say, we have total free memory 8 G , we have input 5G data, then comes 
> another 6G data, 
> if we choose to evict space 6G , it will throw exception saying that
> no object can be free. This is because we didn't count the 3G remaining free
> space .If we count this remaining 3G , we need to ask only 3G,thus
> we can evict the 5G data and we are still alive . 
> case 2. another situation is :  if we have free memory 10G , we input 1.5G 
> data ,then comes another
> 9G data , if we use  10*20% = 2G data to evict ,then we will crash . In this 
> situation we need to 
> use 9+1.5-10 = 0.5G data to evict  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1816) [Java] Resolve new vector classes structure for timestamp, date and maybe interval

2017-11-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258237#comment-16258237
 ] 

ASF GitHub Bot commented on ARROW-1816:
---

icexelloss commented on issue #1330: wip: ARROW-1816: [Java] Resolve new vector 
classes structure for timestamp, date and maybe interval
URL: https://github.com/apache/arrow/pull/1330#issuecomment-345474775
 
 
   This PR is a RFC.
   
   I think the resulting timestamp vector (NullableTimestampVector) is still 
branch-free at the cell level. cc @jacques-n can you take a look and let me 
know what you think?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Resolve new vector classes structure for timestamp, date and maybe 
> interval
> --
>
> Key: ARROW-1816
> URL: https://issues.apache.org/jira/browse/ARROW-1816
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Li Jin
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Personally I think having 8 vector classes for timestamps is not great. This 
> is discussed at some point during the PR:
> https://github.com/apache/arrow/pull/1203#discussion_r145241388



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1816) [Java] Resolve new vector classes structure for timestamp, date and maybe interval

2017-11-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258235#comment-16258235
 ] 

ASF GitHub Bot commented on ARROW-1816:
---

icexelloss opened a new pull request #1330: wip: ARROW-1816: [Java] Resolve new 
vector classes structure for timestamp, date and maybe interval 
URL: https://github.com/apache/arrow/pull/1330
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Resolve new vector classes structure for timestamp, date and maybe 
> interval
> --
>
> Key: ARROW-1816
> URL: https://issues.apache.org/jira/browse/ARROW-1816
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Li Jin
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Personally I think having 8 vector classes for timestamps is not great. This 
> is discussed at some point during the PR:
> https://github.com/apache/arrow/pull/1203#discussion_r145241388



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1816) [Java] Resolve new vector classes structure for timestamp, date and maybe interval

2017-11-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-1816:
--
Labels: pull-request-available  (was: )

> [Java] Resolve new vector classes structure for timestamp, date and maybe 
> interval
> --
>
> Key: ARROW-1816
> URL: https://issues.apache.org/jira/browse/ARROW-1816
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Li Jin
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Personally I think having 8 vector classes for timestamps is not great. This 
> is discussed at some point during the PR:
> https://github.com/apache/arrow/pull/1203#discussion_r145241388



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-18 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-1693:
---

Assignee: Paul Taylor  (was: Brian Hulette)

> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Paul Taylor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1833) [Java] Add accessor methods for data buffers that skip null checking

2017-11-18 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1833:
---

 Summary: [Java] Add accessor methods for data buffers that skip 
null checking
 Key: ARROW-1833
 URL: https://issues.apache.org/jira/browse/ARROW-1833
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java - Vectors
Reporter: Wes McKinney
 Fix For: 0.9.0






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy

2017-11-18 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258231#comment-16258231
 ] 

Wes McKinney commented on ARROW-1710:
-

+1. See ARROW-1833

> [Java] Decide what to do with non-nullable vectors in new vector class 
> hierarchy 
> -
>
> Key: ARROW-1710
> URL: https://issues.apache.org/jira/browse/ARROW-1710
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java - Vectors
>Reporter: Li Jin
>Assignee: Bryan Cutler
> Fix For: 0.8.0
>
>
> So far the consensus seems to be remove all non-nullable vectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1559) [C++] Kernel implementations for "unique" (compute distinct elements of array)

2017-11-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258230#comment-16258230
 ] 

ASF GitHub Bot commented on ARROW-1559:
---

wesm commented on issue #1266: ARROW-1559: [C++] Add Unique kernel and refactor 
DictionaryBuilder to be a stateful kernel
URL: https://github.com/apache/arrow/pull/1266#issuecomment-345473160
 
 
   I think the hash functions we are using are pretty expensive. We don't need 
super high quality hash functions for this code, they only need to be 
reasonable but use limited CPU cycles. We're also going to want to add SSE4.2 
accelerated versions (since sse4.2 has instrinsics for crc32 hashes) that we 
select at runtime if the host processor supports it


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Kernel implementations for "unique" (compute distinct elements of array)
> --
>
> Key: ARROW-1559
> URL: https://issues.apache.org/jira/browse/ARROW-1559
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Uwe L. Korn
>  Labels: Analytics, pull-request-available
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258227#comment-16258227
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-345472682
 
 
   Here's what I'm seeing in the diff in the test directory:
   
   ```
js/test/Arrow.ts |57 +-
js/test/__snapshots__/reader-tests.ts.snap   |   497 -
js/test/__snapshots__/table-tests.ts.snap|  1815 ---
js/test/arrows/cpp/file/datetime.arrow   |   Bin 0 -> 6490 bytes
js/test/arrows/cpp/file/decimal.arrow|   Bin 0 -> 259090 bytes
js/test/arrows/cpp/file/dictionary.arrow |   Bin 0 -> 2562 bytes
js/test/arrows/cpp/file/nested.arrow |   Bin 0 -> 2218 bytes
js/test/arrows/cpp/file/primitive-empty.arrow|   Bin 0 -> 9498 bytes
js/test/arrows/cpp/file/primitive.arrow  |   Bin 0 -> 9442 bytes
js/test/arrows/cpp/file/simple.arrow |   Bin 0 -> 1154 bytes
js/test/arrows/cpp/file/struct_example.arrow |   Bin 0 -> 1538 bytes
js/test/arrows/cpp/stream/datetime.arrow |   Bin 0 -> 5076 bytes
js/test/arrows/cpp/stream/decimal.arrow  |   Bin 0 -> 255228 bytes
js/test/arrows/cpp/stream/dictionary.arrow   |   Bin 0 -> 2004 bytes
js/test/arrows/cpp/stream/nested.arrow   |   Bin 0 -> 1636 bytes
js/test/arrows/cpp/stream/primitive-empty.arrow  |   Bin 0 -> 6852 bytes
js/test/arrows/cpp/stream/primitive.arrow|   Bin 0 -> 7020 bytes
js/test/arrows/cpp/stream/simple.arrow   |   Bin 0 -> 748 bytes
js/test/arrows/cpp/stream/struct_example.arrow   |   Bin 0 -> 1124 bytes
js/test/arrows/file/dictionary.arrow |   Bin 2522 -> 0 bytes
js/test/arrows/file/dictionary2.arrow|   Bin 2762 -> 0 bytes
js/test/arrows/file/multi_dictionary.arrow   |   Bin 3482 -> 0 bytes
js/test/arrows/file/simple.arrow |   Bin 1642 -> 0 bytes
js/test/arrows/file/struct.arrow |   Bin 2354 -> 0 bytes
js/test/arrows/java/file/datetime.arrow  |   Bin 0 -> 6746 bytes
js/test/arrows/java/file/decimal.arrow   |   Bin 0 -> 259730 bytes
js/test/arrows/java/file/dictionary.arrow|   Bin 0 -> 2666 bytes
js/test/arrows/java/file/nested.arrow|   Bin 0 -> 2314 bytes
js/test/arrows/java/file/primitive-empty.arrow   |   Bin 0 -> 9778 bytes
js/test/arrows/java/file/primitive.arrow |   Bin 0 -> 10034 bytes
js/test/arrows/java/file/simple.arrow|   Bin 0 -> 1210 bytes
js/test/arrows/java/file/struct_example.arrow|   Bin 0 -> 1602 bytes
js/test/arrows/java/stream/datetime.arrow|   Bin 0 -> 5196 bytes
js/test/arrows/java/stream/decimal.arrow |   Bin 0 -> 255564 bytes
js/test/arrows/java/stream/dictionary.arrow  |   Bin 0 -> 2036 bytes
js/test/arrows/java/stream/nested.arrow  |   Bin 0 -> 1676 bytes
js/test/arrows/java/stream/primitive-empty.arrow |   Bin 0 -> 6916 bytes
js/test/arrows/java/stream/primitive.arrow   |   Bin 0 -> 7404 bytes
js/test/arrows/java/stream/simple.arrow  |   Bin 0 -> 772 bytes
js/test/arrows/java/stream/struct_example.arrow  |   Bin 0 -> 1148 bytes
js/test/arrows/json/datetime.json|  1091 ++
js/test/arrows/json/decimal.json | 33380 
+++
js/test/arrows/json/dictionary.json  |   424 +
js/test/arrows/json/nested.json  |   384 +
js/test/arrows/json/primitive-empty.json |  1099 ++
js/test/arrows/json/primitive.json   |  1788 +++
js/test/arrows/json/simple.json  |66 +
js/test/arrows/json/struct_example.json  |   237 +
js/test/arrows/multi/count/records.arrow |   Bin 224 -> 0 bytes
js/test/arrows/multi/count/schema.arrow  |   Bin 184 -> 0 bytes
js/test/arrows/multi/latlong/records.arrow   |   Bin 352 -> 0 bytes
js/test/arrows/multi/latlong/schema.arrow|   Bin 264 -> 0 bytes
js/test/arrows/multi/origins/records.arrow   |   Bin 224 -> 0 bytes
js/test/arrows/multi/origins/schema.arrow|   Bin 1604 -> 0 bytes
js/test/arrows/stream/dictionary.arrow   |   Bin 1776 -> 0 bytes
js/test/arrows/stream/simple.arrow   |   Bin 1188 -> 0 bytes
js/test/arrows/stream/struct.arrow   |   Bin 1884 -> 0 bytes
js/test/integration-tests.ts |   114 +
js/test/reader-tests.ts  |69 +-
js/test/table-tests.ts   |   175 +-
js/test/test-config.ts  

[jira] [Resolved] (ARROW-1575) [Python] Add pyarrow.column factory function

2017-11-18 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-1575.
-
Resolution: Fixed

Issue resolved by pull request 1329
[https://github.com/apache/arrow/pull/1329]

> [Python] Add pyarrow.column factory function
> 
>
> Key: ARROW-1575
> URL: https://issues.apache.org/jira/browse/ARROW-1575
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> This would internally call {{Column.from_array}} as appropriate



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1575) [Python] Add pyarrow.column factory function

2017-11-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258225#comment-16258225
 ] 

ASF GitHub Bot commented on ARROW-1575:
---

wesm closed pull request #1329: ARROW-1575: [Python] Add tests for 
pyarrow.column factory function
URL: https://github.com/apache/arrow/pull/1329
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/python/pyarrow/table.pxi b/python/pyarrow/table.pxi
index 1a9d23db4..591f32975 100644
--- a/python/pyarrow/table.pxi
+++ b/python/pyarrow/table.pxi
@@ -166,6 +166,15 @@ def chunked_array(arrays, type=None):
 def column(object field_or_name, arr):
 """
 Create Column object from field/string and array-like data
+
+Parameters
+--
+field_or_name : string or Field
+arr : Array, list of Arrays, or ChunkedArray
+
+Returns
+---
+column : Column
 """
 cdef:
 Field boxed_field
diff --git a/python/pyarrow/tests/test_table.py 
b/python/pyarrow/tests/test_table.py
index 428222466..cd05fb8e1 100644
--- a/python/pyarrow/tests/test_table.py
+++ b/python/pyarrow/tests/test_table.py
@@ -21,42 +21,55 @@
 import pandas as pd
 import pytest
 
-from pyarrow.compat import unittest
 import pyarrow as pa
 
 
-class TestColumn(unittest.TestCase):
-
-def test_basics(self):
-data = [
-pa.array([-10, -5, 0, 5, 10])
-]
-table = pa.Table.from_arrays(data, names=['a'])
-column = table.column(0)
-assert column.name == 'a'
-assert column.length() == 5
-assert len(column) == 5
-assert column.shape == (5,)
-assert column.to_pylist() == [-10, -5, 0, 5, 10]
-
-def test_from_array(self):
-arr = pa.array([0, 1, 2, 3, 4])
-
-col1 = pa.Column.from_array('foo', arr)
-col2 = pa.Column.from_array(pa.field('foo', arr.type), arr)
-
-assert col1.equals(col2)
-
-def test_pandas(self):
-data = [
-pa.array([-10, -5, 0, 5, 10])
-]
-table = pa.Table.from_arrays(data, names=['a'])
-column = table.column(0)
-series = column.to_pandas()
-assert series.name == 'a'
-assert series.shape == (5,)
-assert series.iloc[0] == -10
+def test_column_basics():
+data = [
+pa.array([-10, -5, 0, 5, 10])
+]
+table = pa.Table.from_arrays(data, names=['a'])
+column = table.column(0)
+assert column.name == 'a'
+assert column.length() == 5
+assert len(column) == 5
+assert column.shape == (5,)
+assert column.to_pylist() == [-10, -5, 0, 5, 10]
+
+
+def test_column_factory_function():
+# ARROW-1575
+arr = pa.array([0, 1, 2, 3, 4])
+arr2 = pa.array([5, 6, 7, 8])
+
+col1 = pa.Column.from_array('foo', arr)
+col2 = pa.Column.from_array(pa.field('foo', arr.type), arr)
+
+assert col1.equals(col2)
+
+col3 = pa.column('foo', [arr, arr2])
+chunked_arr = pa.chunked_array([arr, arr2])
+col4 = pa.column('foo', chunked_arr)
+assert col3.equals(col4)
+
+col5 = pa.column('foo', arr.to_pandas())
+assert col5.equals(pa.column('foo', arr))
+
+# Type mismatch
+with pytest.raises(ValueError):
+pa.Column.from_array(pa.field('foo', pa.string()), arr)
+
+
+def test_column_to_pandas():
+data = [
+pa.array([-10, -5, 0, 5, 10])
+]
+table = pa.Table.from_arrays(data, names=['a'])
+column = table.column(0)
+series = column.to_pandas()
+assert series.name == 'a'
+assert series.shape == (5,)
+assert series.iloc[0] == -10
 
 
 def test_recordbatch_basics():


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Add pyarrow.column factory function
> 
>
> Key: ARROW-1575
> URL: https://issues.apache.org/jira/browse/ARROW-1575
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> This would internally call {{Column.from_array}} as appropriate



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (ARROW-1827) [Java] Add checkstyle config file and header file

2017-11-18 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-1827.
-
   Resolution: Fixed
Fix Version/s: 0.8.0

Issue resolved by pull request 1326
[https://github.com/apache/arrow/pull/1326]

> [Java] Add checkstyle config file and header file
> -
>
> Key: ARROW-1827
> URL: https://issues.apache.org/jira/browse/ARROW-1827
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Li Jin
>Assignee: Li Jin
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1827) [Java] Add checkstyle config file and header file

2017-11-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258222#comment-16258222
 ] 

ASF GitHub Bot commented on ARROW-1827:
---

wesm closed pull request #1326: ARROW-1827: [Java] Add checkstyle file and 
license template
URL: https://github.com/apache/arrow/pull/1326
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/java/.gitattributes b/java/.gitattributes
new file mode 100644
index 0..cb02d8226
--- /dev/null
+++ b/java/.gitattributes
@@ -0,0 +1,3 @@
+.gitattributes export-ignore
+.gitignore export-ignore
+/dev export-ignore
diff --git a/java/dev/checkstyle/checkstyle.license 
b/java/dev/checkstyle/checkstyle.license
new file mode 100644
index 0..c06c90cd2
--- /dev/null
+++ b/java/dev/checkstyle/checkstyle.license
@@ -0,0 +1,17 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
diff --git a/java/dev/checkstyle/checkstyle.xml 
b/java/dev/checkstyle/checkstyle.xml
new file mode 100644
index 0..14dbede16
--- /dev/null
+++ b/java/dev/checkstyle/checkstyle.xml
@@ -0,0 +1,254 @@
+
+
+http://www.puppycrawl.com/dtds/configuration_1_3.dtd;>
+
+
+
+
+
+
+
+
+
+
+
+
+
+  
+
+
+
+  
+
+
+
+
+
+
+  
+  
+  
+
+
+
+
+
+
+
+  
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ftp://"/>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ 
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ 
+ 
+ 
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/java/dev/checkstyle/suppressions.xml 
b/java/dev/checkstyle/suppressions.xml
new file mode 100644
index 0..36697256d
--- /dev/null
+++ b/java/dev/checkstyle/suppressions.xml
@@ -0,0 +1,31 @@
+
+
+http://www.puppycrawl.com/dtds/suppressions_1_1.dtd;>
+
+
+  
+  
+  
+
+  
+  
+  
+  
+
diff --git a/java/pom.xml b/java/pom.xml
index 0a0f2e0ce..c479d651f 100644
--- a/java/pom.xml
+++ b/java/pom.xml
@@ -304,7 +304,9 @@
   
 
 
-  google_checks.xml
+  dev/checkstyle/checkstyle.xml
+  dev/checkstyle/checkstyle.license
+  
dev/checkstyle/suppressions.xml
   UTF-8
   true
   

[jira] [Created] (ARROW-1832) [JS] Implement JSON reader for integration tests

2017-11-18 Thread Brian Hulette (JIRA)
Brian Hulette created ARROW-1832:


 Summary: [JS] Implement JSON reader for integration tests
 Key: ARROW-1832
 URL: https://issues.apache.org/jira/browse/ARROW-1832
 Project: Apache Arrow
  Issue Type: New Feature
  Components: JavaScript
Reporter: Brian Hulette
Assignee: Brian Hulette


Implementing a JSON reader will allow us to write a "validate" script for the 
consumer half of the integration tests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1827) [Java] Add checkstyle config file and header file

2017-11-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258157#comment-16258157
 ] 

ASF GitHub Bot commented on ARROW-1827:
---

icexelloss commented on a change in pull request #1326: ARROW-1827: [Java] Add 
checkstyle file and license template
URL: https://github.com/apache/arrow/pull/1326#discussion_r151844456
 
 

 ##
 File path: java/checkstyle/checkstyle.xml
 ##
 @@ -0,0 +1,238 @@
+
+
+http://www.puppycrawl.com/dtds/configuration_1_3.dtd;>
+
+
> Key: ARROW-1827
> URL: https://issues.apache.org/jira/browse/ARROW-1827
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Li Jin
>Assignee: Li Jin
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1827) [Java] Add checkstyle config file and header file

2017-11-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258152#comment-16258152
 ] 

ASF GitHub Bot commented on ARROW-1827:
---

icexelloss commented on issue #1326: ARROW-1827: [Java] Add checkstyle file and 
license template
URL: https://github.com/apache/arrow/pull/1326#issuecomment-345463040
 
 
   I re-added checkstyle.xml file from checkstyle project.
   
   I verified checkstyle works as expected and git archive doesn't include the 
checkstyle files.
   
   This should be good to go.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add checkstyle config file and header file
> -
>
> Key: ARROW-1827
> URL: https://issues.apache.org/jira/browse/ARROW-1827
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Li Jin
>Assignee: Li Jin
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy

2017-11-18 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258123#comment-16258123
 ] 

Jacques Nadeau commented on ARROW-1710:
---

Agree to both nullable prefix removable and adding "dirty" accessor/mutator 
methods but i think the latter could come in 0.9.0 since it is enhancement to 
the api.

> [Java] Decide what to do with non-nullable vectors in new vector class 
> hierarchy 
> -
>
> Key: ARROW-1710
> URL: https://issues.apache.org/jira/browse/ARROW-1710
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java - Vectors
>Reporter: Li Jin
>Assignee: Bryan Cutler
> Fix For: 0.8.0
>
>
> So far the consensus seems to be remove all non-nullable vectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1769) Python: pyarrow.parquet.write_to_dataset creates cyclic references

2017-11-18 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258073#comment-16258073
 ] 

Uwe L. Korn commented on ARROW-1769:


We could drop various `gc.collect()` calls in different places but I would like 
to refrain from date and hope for the next pandas release arriving soon.

> Python: pyarrow.parquet.write_to_dataset creates cyclic references
> --
>
> Key: ARROW-1769
> URL: https://issues.apache.org/jira/browse/ARROW-1769
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.7.1
>Reporter: Uwe L. Korn
> Fix For: 0.8.0
>
>
> See https://github.com/apache/arrow/issues/1285 for the initial issue. Having 
> cyclic references is a valid state in Python as they can be cleaned up by the 
> garbage collector. But as the garbage collector normally runs at a point 
> which is not clear to the user and we deal here normally with larger objects, 
> we should get rid of the cyclic reference to evict data as soon as possible 
> from main memory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1827) [Java] Add checkstyle config file and header file

2017-11-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258071#comment-16258071
 ] 

ASF GitHub Bot commented on ARROW-1827:
---

xhochy commented on a change in pull request #1326: ARROW-1827: [Java] Add 
checkstyle file and license template
URL: https://github.com/apache/arrow/pull/1326#discussion_r151838026
 
 

 ##
 File path: java/checkstyle/checkstyle.xml
 ##
 @@ -0,0 +1,238 @@
+
+
+http://www.puppycrawl.com/dtds/configuration_1_3.dtd;>
+
+
> Key: ARROW-1827
> URL: https://issues.apache.org/jira/browse/ARROW-1827
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Li Jin
>Assignee: Li Jin
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1559) [C++] Kernel implementations for "unique" (compute distinct elements of array)

2017-11-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258064#comment-16258064
 ] 

ASF GitHub Bot commented on ARROW-1559:
---

xhochy commented on a change in pull request #1266: ARROW-1559: [C++] Add 
Unique kernel and refactor DictionaryBuilder to be a stateful kernel
URL: https://github.com/apache/arrow/pull/1266#discussion_r151837230
 
 

 ##
 File path: cpp/src/arrow/compute/kernels/hash.cc
 ##
 @@ -0,0 +1,880 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "arrow/compute/kernels/hash.h"
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "arrow/builder.h"
+#include "arrow/compute/context.h"
+#include "arrow/compute/kernel.h"
+#include "arrow/compute/kernels/util-internal.h"
+#include "arrow/util/hash-util.h"
+
+namespace arrow {
+namespace compute {
+
+namespace {
+
+// Initially 1024 elements
+static constexpr int64_t kInitialHashTableSize = 1 << 10;
+
+typedef int32_t hash_slot_t;
+static constexpr hash_slot_t kHashSlotEmpty = 
std::numeric_limits::max();
+
+// The maximum load factor for the hash table before resizing.
+static constexpr double kMaxHashTableLoad = 0.7;
+
+enum class SIMDMode : char { NOSIMD, SSE4, AVX2 };
+
+#define CHECK_IMPLEMENTED(KERNEL, FUNCNAME, TYPE)  \
+  if (!KERNEL) {   \
+std::stringstream ss;  \
+ss << FUNCNAME << " not implemented for " << type->ToString(); \
+return Status::NotImplemented(ss.str());   \
+  }
+
+Status NewHashTable(int64_t size, MemoryPool* pool, std::shared_ptr* 
out) {
+  auto hash_table = std::make_shared(pool);
+
+  RETURN_NOT_OK(hash_table->Resize(sizeof(hash_slot_t) * size));
+  int32_t* slots = reinterpret_cast(hash_table->mutable_data());
+  std::fill(slots, slots + size, kHashSlotEmpty);
+
+  *out = hash_table;
+  return Status::OK();
+}
+
+// This is a slight design concession -- some hash actions have the possibility
+// of failure. Rather than introduce extra error checking into all actions, we
+// will raise an internal exception so that only the actions where errors can
+// occur will experience the extra overhead
+class HashException : public std::exception {
+ public:
+  explicit HashException(const std::string& msg, StatusCode code = 
StatusCode::Invalid)
+  : msg_(msg), code_(code) {}
+
+  ~HashException() throw() {}
+
+  const char* what() const throw() override;
+
+  StatusCode code() const { return code_; }
+
+ private:
+  std::string msg_;
+  StatusCode code_;
+};
+
+const char* HashException::what() const throw() { return msg_.c_str(); }
+
+class HashTable {
+ public:
+  HashTable(const std::shared_ptr& type, MemoryPool* pool)
+  : type_(type),
+pool_(pool),
+initialized_(false),
+hash_table_(nullptr),
+hash_slots_(nullptr),
+hash_table_size_(0),
+mod_bitmask_(0) {}
+
+  virtual ~HashTable() {}
+
+  virtual Status Append(const ArrayData& input) = 0;
+  virtual Status Flush(std::vector* out) = 0;
+  virtual Status GetDictionary(std::shared_ptr* out) = 0;
+
+ protected:
+  Status Init(int64_t elements);
+
+  std::shared_ptr type_;
+  MemoryPool* pool_;
+  bool initialized_;
+
+  // The hash table contains integer indices that reference the set of observed
+  // distinct values
+  std::shared_ptr hash_table_;
+  hash_slot_t* hash_slots_;
+
+  /// Size of the table. Must be a power of 2.
+  int64_t hash_table_size_;
+
+  // Store hash_table_size_ - 1, so that j & mod_bitmask_ is equivalent to j %
+  // hash_table_size_, but uses far fewer CPU cycles
+  int64_t mod_bitmask_;
+};
+
+Status HashTable::Init(int64_t elements) {
+  DCHECK_EQ(elements, BitUtil::NextPower2(elements));
+  RETURN_NOT_OK(NewHashTable(elements, pool_, _table_));
+  hash_slots_ = reinterpret_cast(hash_table_->mutable_data());
+  hash_table_size_ = elements;
+  mod_bitmask_ = elements - 1;
+  initialized_ = true;
+  return Status::OK();
+}
+
+template 
+class HashTableKernel : public HashTable {};
+
+// Types of 

[jira] [Assigned] (ARROW-1755) [C++] Add build options for MSVC to use static runtime libraries

2017-11-18 Thread Max Risuhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Risuhin reassigned ARROW-1755:
--

Assignee: Max Risuhin

> [C++] Add build options for MSVC to use static runtime libraries
> 
>
> Key: ARROW-1755
> URL: https://issues.apache.org/jira/browse/ARROW-1755
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Max Risuhin
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)