[jira] [Commented] (ARROW-1795) [Plasma C++] change evict policy
[ https://issues.apache.org/jira/browse/ARROW-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258253#comment-16258253 ] ASF GitHub Bot commented on ARROW-1795: --- robertnishihara commented on issue #1327: ARROW-1795: [Plasma] Create flag to make Plasma store use a single memory-mapped file. URL: https://github.com/apache/arrow/pull/1327#issuecomment-345478483 Thanks for letting us know. We'll keep an eye out for it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Plasma C++] change evict policy > > > Key: ARROW-1795 > URL: https://issues.apache.org/jira/browse/ARROW-1795 > Project: Apache Arrow > Issue Type: Improvement > Components: Plasma (C++) >Reporter: Lu Qi >Assignee: Robert Nishihara >Priority: Minor > Labels: pull-request-available > Fix For: 0.8.0 > > > case 1.say, we have total free memory 8 G , we have input 5G data, then comes > another 6G data, > if we choose to evict space 6G , it will throw exception saying that > no object can be free. This is because we didn't count the 3G remaining free > space .If we count this remaining 3G , we need to ask only 3G,thus > we can evict the 5G data and we are still alive . > case 2. another situation is : if we have free memory 10G , we input 1.5G > data ,then comes another > 9G data , if we use 10*20% = 2G data to evict ,then we will crash . In this > situation we need to > use 9+1.5-10 = 0.5G data to evict -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1816) [Java] Resolve new vector classes structure for timestamp, date and maybe interval
[ https://issues.apache.org/jira/browse/ARROW-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258237#comment-16258237 ] ASF GitHub Bot commented on ARROW-1816: --- icexelloss commented on issue #1330: wip: ARROW-1816: [Java] Resolve new vector classes structure for timestamp, date and maybe interval URL: https://github.com/apache/arrow/pull/1330#issuecomment-345474775 This PR is a RFC. I think the resulting timestamp vector (NullableTimestampVector) is still branch-free at the cell level. cc @jacques-n can you take a look and let me know what you think? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Resolve new vector classes structure for timestamp, date and maybe > interval > -- > > Key: ARROW-1816 > URL: https://issues.apache.org/jira/browse/ARROW-1816 > Project: Apache Arrow > Issue Type: Sub-task >Reporter: Li Jin > Labels: pull-request-available > Fix For: 0.8.0 > > > Personally I think having 8 vector classes for timestamps is not great. This > is discussed at some point during the PR: > https://github.com/apache/arrow/pull/1203#discussion_r145241388 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1816) [Java] Resolve new vector classes structure for timestamp, date and maybe interval
[ https://issues.apache.org/jira/browse/ARROW-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258235#comment-16258235 ] ASF GitHub Bot commented on ARROW-1816: --- icexelloss opened a new pull request #1330: wip: ARROW-1816: [Java] Resolve new vector classes structure for timestamp, date and maybe interval URL: https://github.com/apache/arrow/pull/1330 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Resolve new vector classes structure for timestamp, date and maybe > interval > -- > > Key: ARROW-1816 > URL: https://issues.apache.org/jira/browse/ARROW-1816 > Project: Apache Arrow > Issue Type: Sub-task >Reporter: Li Jin > Labels: pull-request-available > Fix For: 0.8.0 > > > Personally I think having 8 vector classes for timestamps is not great. This > is discussed at some point during the PR: > https://github.com/apache/arrow/pull/1203#discussion_r145241388 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ARROW-1816) [Java] Resolve new vector classes structure for timestamp, date and maybe interval
[ https://issues.apache.org/jira/browse/ARROW-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-1816: -- Labels: pull-request-available (was: ) > [Java] Resolve new vector classes structure for timestamp, date and maybe > interval > -- > > Key: ARROW-1816 > URL: https://issues.apache.org/jira/browse/ARROW-1816 > Project: Apache Arrow > Issue Type: Sub-task >Reporter: Li Jin > Labels: pull-request-available > Fix For: 0.8.0 > > > Personally I think having 8 vector classes for timestamps is not great. This > is discussed at some point during the PR: > https://github.com/apache/arrow/pull/1203#discussion_r145241388 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-1693: --- Assignee: Paul Taylor (was: Brian Hulette) > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Paul Taylor > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1833) [Java] Add accessor methods for data buffers that skip null checking
Wes McKinney created ARROW-1833: --- Summary: [Java] Add accessor methods for data buffers that skip null checking Key: ARROW-1833 URL: https://issues.apache.org/jira/browse/ARROW-1833 Project: Apache Arrow Issue Type: Improvement Components: Java - Vectors Reporter: Wes McKinney Fix For: 0.9.0 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258231#comment-16258231 ] Wes McKinney commented on ARROW-1710: - +1. See ARROW-1833 > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin >Assignee: Bryan Cutler > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1559) [C++] Kernel implementations for "unique" (compute distinct elements of array)
[ https://issues.apache.org/jira/browse/ARROW-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258230#comment-16258230 ] ASF GitHub Bot commented on ARROW-1559: --- wesm commented on issue #1266: ARROW-1559: [C++] Add Unique kernel and refactor DictionaryBuilder to be a stateful kernel URL: https://github.com/apache/arrow/pull/1266#issuecomment-345473160 I think the hash functions we are using are pretty expensive. We don't need super high quality hash functions for this code, they only need to be reasonable but use limited CPU cycles. We're also going to want to add SSE4.2 accelerated versions (since sse4.2 has instrinsics for crc32 hashes) that we select at runtime if the host processor supports it This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] Kernel implementations for "unique" (compute distinct elements of array) > -- > > Key: ARROW-1559 > URL: https://issues.apache.org/jira/browse/ARROW-1559 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Assignee: Uwe L. Korn > Labels: Analytics, pull-request-available > Fix For: 0.8.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258227#comment-16258227 ] ASF GitHub Bot commented on ARROW-1693: --- wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#issuecomment-345472682 Here's what I'm seeing in the diff in the test directory: ``` js/test/Arrow.ts |57 +- js/test/__snapshots__/reader-tests.ts.snap | 497 - js/test/__snapshots__/table-tests.ts.snap| 1815 --- js/test/arrows/cpp/file/datetime.arrow | Bin 0 -> 6490 bytes js/test/arrows/cpp/file/decimal.arrow| Bin 0 -> 259090 bytes js/test/arrows/cpp/file/dictionary.arrow | Bin 0 -> 2562 bytes js/test/arrows/cpp/file/nested.arrow | Bin 0 -> 2218 bytes js/test/arrows/cpp/file/primitive-empty.arrow| Bin 0 -> 9498 bytes js/test/arrows/cpp/file/primitive.arrow | Bin 0 -> 9442 bytes js/test/arrows/cpp/file/simple.arrow | Bin 0 -> 1154 bytes js/test/arrows/cpp/file/struct_example.arrow | Bin 0 -> 1538 bytes js/test/arrows/cpp/stream/datetime.arrow | Bin 0 -> 5076 bytes js/test/arrows/cpp/stream/decimal.arrow | Bin 0 -> 255228 bytes js/test/arrows/cpp/stream/dictionary.arrow | Bin 0 -> 2004 bytes js/test/arrows/cpp/stream/nested.arrow | Bin 0 -> 1636 bytes js/test/arrows/cpp/stream/primitive-empty.arrow | Bin 0 -> 6852 bytes js/test/arrows/cpp/stream/primitive.arrow| Bin 0 -> 7020 bytes js/test/arrows/cpp/stream/simple.arrow | Bin 0 -> 748 bytes js/test/arrows/cpp/stream/struct_example.arrow | Bin 0 -> 1124 bytes js/test/arrows/file/dictionary.arrow | Bin 2522 -> 0 bytes js/test/arrows/file/dictionary2.arrow| Bin 2762 -> 0 bytes js/test/arrows/file/multi_dictionary.arrow | Bin 3482 -> 0 bytes js/test/arrows/file/simple.arrow | Bin 1642 -> 0 bytes js/test/arrows/file/struct.arrow | Bin 2354 -> 0 bytes js/test/arrows/java/file/datetime.arrow | Bin 0 -> 6746 bytes js/test/arrows/java/file/decimal.arrow | Bin 0 -> 259730 bytes js/test/arrows/java/file/dictionary.arrow| Bin 0 -> 2666 bytes js/test/arrows/java/file/nested.arrow| Bin 0 -> 2314 bytes js/test/arrows/java/file/primitive-empty.arrow | Bin 0 -> 9778 bytes js/test/arrows/java/file/primitive.arrow | Bin 0 -> 10034 bytes js/test/arrows/java/file/simple.arrow| Bin 0 -> 1210 bytes js/test/arrows/java/file/struct_example.arrow| Bin 0 -> 1602 bytes js/test/arrows/java/stream/datetime.arrow| Bin 0 -> 5196 bytes js/test/arrows/java/stream/decimal.arrow | Bin 0 -> 255564 bytes js/test/arrows/java/stream/dictionary.arrow | Bin 0 -> 2036 bytes js/test/arrows/java/stream/nested.arrow | Bin 0 -> 1676 bytes js/test/arrows/java/stream/primitive-empty.arrow | Bin 0 -> 6916 bytes js/test/arrows/java/stream/primitive.arrow | Bin 0 -> 7404 bytes js/test/arrows/java/stream/simple.arrow | Bin 0 -> 772 bytes js/test/arrows/java/stream/struct_example.arrow | Bin 0 -> 1148 bytes js/test/arrows/json/datetime.json| 1091 ++ js/test/arrows/json/decimal.json | 33380 +++ js/test/arrows/json/dictionary.json | 424 + js/test/arrows/json/nested.json | 384 + js/test/arrows/json/primitive-empty.json | 1099 ++ js/test/arrows/json/primitive.json | 1788 +++ js/test/arrows/json/simple.json |66 + js/test/arrows/json/struct_example.json | 237 + js/test/arrows/multi/count/records.arrow | Bin 224 -> 0 bytes js/test/arrows/multi/count/schema.arrow | Bin 184 -> 0 bytes js/test/arrows/multi/latlong/records.arrow | Bin 352 -> 0 bytes js/test/arrows/multi/latlong/schema.arrow| Bin 264 -> 0 bytes js/test/arrows/multi/origins/records.arrow | Bin 224 -> 0 bytes js/test/arrows/multi/origins/schema.arrow| Bin 1604 -> 0 bytes js/test/arrows/stream/dictionary.arrow | Bin 1776 -> 0 bytes js/test/arrows/stream/simple.arrow | Bin 1188 -> 0 bytes js/test/arrows/stream/struct.arrow | Bin 1884 -> 0 bytes js/test/integration-tests.ts | 114 + js/test/reader-tests.ts |69 +- js/test/table-tests.ts | 175 +- js/test/test-config.ts
[jira] [Resolved] (ARROW-1575) [Python] Add pyarrow.column factory function
[ https://issues.apache.org/jira/browse/ARROW-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-1575. - Resolution: Fixed Issue resolved by pull request 1329 [https://github.com/apache/arrow/pull/1329] > [Python] Add pyarrow.column factory function > > > Key: ARROW-1575 > URL: https://issues.apache.org/jira/browse/ARROW-1575 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Wes McKinney >Assignee: Wes McKinney > Labels: pull-request-available > Fix For: 0.8.0 > > > This would internally call {{Column.from_array}} as appropriate -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1575) [Python] Add pyarrow.column factory function
[ https://issues.apache.org/jira/browse/ARROW-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258225#comment-16258225 ] ASF GitHub Bot commented on ARROW-1575: --- wesm closed pull request #1329: ARROW-1575: [Python] Add tests for pyarrow.column factory function URL: https://github.com/apache/arrow/pull/1329 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/python/pyarrow/table.pxi b/python/pyarrow/table.pxi index 1a9d23db4..591f32975 100644 --- a/python/pyarrow/table.pxi +++ b/python/pyarrow/table.pxi @@ -166,6 +166,15 @@ def chunked_array(arrays, type=None): def column(object field_or_name, arr): """ Create Column object from field/string and array-like data + +Parameters +-- +field_or_name : string or Field +arr : Array, list of Arrays, or ChunkedArray + +Returns +--- +column : Column """ cdef: Field boxed_field diff --git a/python/pyarrow/tests/test_table.py b/python/pyarrow/tests/test_table.py index 428222466..cd05fb8e1 100644 --- a/python/pyarrow/tests/test_table.py +++ b/python/pyarrow/tests/test_table.py @@ -21,42 +21,55 @@ import pandas as pd import pytest -from pyarrow.compat import unittest import pyarrow as pa -class TestColumn(unittest.TestCase): - -def test_basics(self): -data = [ -pa.array([-10, -5, 0, 5, 10]) -] -table = pa.Table.from_arrays(data, names=['a']) -column = table.column(0) -assert column.name == 'a' -assert column.length() == 5 -assert len(column) == 5 -assert column.shape == (5,) -assert column.to_pylist() == [-10, -5, 0, 5, 10] - -def test_from_array(self): -arr = pa.array([0, 1, 2, 3, 4]) - -col1 = pa.Column.from_array('foo', arr) -col2 = pa.Column.from_array(pa.field('foo', arr.type), arr) - -assert col1.equals(col2) - -def test_pandas(self): -data = [ -pa.array([-10, -5, 0, 5, 10]) -] -table = pa.Table.from_arrays(data, names=['a']) -column = table.column(0) -series = column.to_pandas() -assert series.name == 'a' -assert series.shape == (5,) -assert series.iloc[0] == -10 +def test_column_basics(): +data = [ +pa.array([-10, -5, 0, 5, 10]) +] +table = pa.Table.from_arrays(data, names=['a']) +column = table.column(0) +assert column.name == 'a' +assert column.length() == 5 +assert len(column) == 5 +assert column.shape == (5,) +assert column.to_pylist() == [-10, -5, 0, 5, 10] + + +def test_column_factory_function(): +# ARROW-1575 +arr = pa.array([0, 1, 2, 3, 4]) +arr2 = pa.array([5, 6, 7, 8]) + +col1 = pa.Column.from_array('foo', arr) +col2 = pa.Column.from_array(pa.field('foo', arr.type), arr) + +assert col1.equals(col2) + +col3 = pa.column('foo', [arr, arr2]) +chunked_arr = pa.chunked_array([arr, arr2]) +col4 = pa.column('foo', chunked_arr) +assert col3.equals(col4) + +col5 = pa.column('foo', arr.to_pandas()) +assert col5.equals(pa.column('foo', arr)) + +# Type mismatch +with pytest.raises(ValueError): +pa.Column.from_array(pa.field('foo', pa.string()), arr) + + +def test_column_to_pandas(): +data = [ +pa.array([-10, -5, 0, 5, 10]) +] +table = pa.Table.from_arrays(data, names=['a']) +column = table.column(0) +series = column.to_pandas() +assert series.name == 'a' +assert series.shape == (5,) +assert series.iloc[0] == -10 def test_recordbatch_basics(): This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] Add pyarrow.column factory function > > > Key: ARROW-1575 > URL: https://issues.apache.org/jira/browse/ARROW-1575 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Wes McKinney >Assignee: Wes McKinney > Labels: pull-request-available > Fix For: 0.8.0 > > > This would internally call {{Column.from_array}} as appropriate -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (ARROW-1827) [Java] Add checkstyle config file and header file
[ https://issues.apache.org/jira/browse/ARROW-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-1827. - Resolution: Fixed Fix Version/s: 0.8.0 Issue resolved by pull request 1326 [https://github.com/apache/arrow/pull/1326] > [Java] Add checkstyle config file and header file > - > > Key: ARROW-1827 > URL: https://issues.apache.org/jira/browse/ARROW-1827 > Project: Apache Arrow > Issue Type: Task >Reporter: Li Jin >Assignee: Li Jin > Labels: pull-request-available > Fix For: 0.8.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1827) [Java] Add checkstyle config file and header file
[ https://issues.apache.org/jira/browse/ARROW-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258222#comment-16258222 ] ASF GitHub Bot commented on ARROW-1827: --- wesm closed pull request #1326: ARROW-1827: [Java] Add checkstyle file and license template URL: https://github.com/apache/arrow/pull/1326 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/java/.gitattributes b/java/.gitattributes new file mode 100644 index 0..cb02d8226 --- /dev/null +++ b/java/.gitattributes @@ -0,0 +1,3 @@ +.gitattributes export-ignore +.gitignore export-ignore +/dev export-ignore diff --git a/java/dev/checkstyle/checkstyle.license b/java/dev/checkstyle/checkstyle.license new file mode 100644 index 0..c06c90cd2 --- /dev/null +++ b/java/dev/checkstyle/checkstyle.license @@ -0,0 +1,17 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ diff --git a/java/dev/checkstyle/checkstyle.xml b/java/dev/checkstyle/checkstyle.xml new file mode 100644 index 0..14dbede16 --- /dev/null +++ b/java/dev/checkstyle/checkstyle.xml @@ -0,0 +1,254 @@ + + +http://www.puppycrawl.com/dtds/configuration_1_3.dtd;> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +ftp://"/> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/java/dev/checkstyle/suppressions.xml b/java/dev/checkstyle/suppressions.xml new file mode 100644 index 0..36697256d --- /dev/null +++ b/java/dev/checkstyle/suppressions.xml @@ -0,0 +1,31 @@ + + +http://www.puppycrawl.com/dtds/suppressions_1_1.dtd;> + + + + + + + + + + + diff --git a/java/pom.xml b/java/pom.xml index 0a0f2e0ce..c479d651f 100644 --- a/java/pom.xml +++ b/java/pom.xml @@ -304,7 +304,9 @@ - google_checks.xml + dev/checkstyle/checkstyle.xml + dev/checkstyle/checkstyle.license + dev/checkstyle/suppressions.xml UTF-8 true
[jira] [Created] (ARROW-1832) [JS] Implement JSON reader for integration tests
Brian Hulette created ARROW-1832: Summary: [JS] Implement JSON reader for integration tests Key: ARROW-1832 URL: https://issues.apache.org/jira/browse/ARROW-1832 Project: Apache Arrow Issue Type: New Feature Components: JavaScript Reporter: Brian Hulette Assignee: Brian Hulette Implementing a JSON reader will allow us to write a "validate" script for the consumer half of the integration tests. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1827) [Java] Add checkstyle config file and header file
[ https://issues.apache.org/jira/browse/ARROW-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258157#comment-16258157 ] ASF GitHub Bot commented on ARROW-1827: --- icexelloss commented on a change in pull request #1326: ARROW-1827: [Java] Add checkstyle file and license template URL: https://github.com/apache/arrow/pull/1326#discussion_r151844456 ## File path: java/checkstyle/checkstyle.xml ## @@ -0,0 +1,238 @@ + + +http://www.puppycrawl.com/dtds/configuration_1_3.dtd;> + + > Key: ARROW-1827 > URL: https://issues.apache.org/jira/browse/ARROW-1827 > Project: Apache Arrow > Issue Type: Task >Reporter: Li Jin >Assignee: Li Jin > Labels: pull-request-available > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1827) [Java] Add checkstyle config file and header file
[ https://issues.apache.org/jira/browse/ARROW-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258152#comment-16258152 ] ASF GitHub Bot commented on ARROW-1827: --- icexelloss commented on issue #1326: ARROW-1827: [Java] Add checkstyle file and license template URL: https://github.com/apache/arrow/pull/1326#issuecomment-345463040 I re-added checkstyle.xml file from checkstyle project. I verified checkstyle works as expected and git archive doesn't include the checkstyle files. This should be good to go. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add checkstyle config file and header file > - > > Key: ARROW-1827 > URL: https://issues.apache.org/jira/browse/ARROW-1827 > Project: Apache Arrow > Issue Type: Task >Reporter: Li Jin >Assignee: Li Jin > Labels: pull-request-available > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy
[ https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258123#comment-16258123 ] Jacques Nadeau commented on ARROW-1710: --- Agree to both nullable prefix removable and adding "dirty" accessor/mutator methods but i think the latter could come in 0.9.0 since it is enhancement to the api. > [Java] Decide what to do with non-nullable vectors in new vector class > hierarchy > - > > Key: ARROW-1710 > URL: https://issues.apache.org/jira/browse/ARROW-1710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Li Jin >Assignee: Bryan Cutler > Fix For: 0.8.0 > > > So far the consensus seems to be remove all non-nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1769) Python: pyarrow.parquet.write_to_dataset creates cyclic references
[ https://issues.apache.org/jira/browse/ARROW-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258073#comment-16258073 ] Uwe L. Korn commented on ARROW-1769: We could drop various `gc.collect()` calls in different places but I would like to refrain from date and hope for the next pandas release arriving soon. > Python: pyarrow.parquet.write_to_dataset creates cyclic references > -- > > Key: ARROW-1769 > URL: https://issues.apache.org/jira/browse/ARROW-1769 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.1 >Reporter: Uwe L. Korn > Fix For: 0.8.0 > > > See https://github.com/apache/arrow/issues/1285 for the initial issue. Having > cyclic references is a valid state in Python as they can be cleaned up by the > garbage collector. But as the garbage collector normally runs at a point > which is not clear to the user and we deal here normally with larger objects, > we should get rid of the cyclic reference to evict data as soon as possible > from main memory. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1827) [Java] Add checkstyle config file and header file
[ https://issues.apache.org/jira/browse/ARROW-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258071#comment-16258071 ] ASF GitHub Bot commented on ARROW-1827: --- xhochy commented on a change in pull request #1326: ARROW-1827: [Java] Add checkstyle file and license template URL: https://github.com/apache/arrow/pull/1326#discussion_r151838026 ## File path: java/checkstyle/checkstyle.xml ## @@ -0,0 +1,238 @@ + + +http://www.puppycrawl.com/dtds/configuration_1_3.dtd;> + + > Key: ARROW-1827 > URL: https://issues.apache.org/jira/browse/ARROW-1827 > Project: Apache Arrow > Issue Type: Task >Reporter: Li Jin >Assignee: Li Jin > Labels: pull-request-available > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1559) [C++] Kernel implementations for "unique" (compute distinct elements of array)
[ https://issues.apache.org/jira/browse/ARROW-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258064#comment-16258064 ] ASF GitHub Bot commented on ARROW-1559: --- xhochy commented on a change in pull request #1266: ARROW-1559: [C++] Add Unique kernel and refactor DictionaryBuilder to be a stateful kernel URL: https://github.com/apache/arrow/pull/1266#discussion_r151837230 ## File path: cpp/src/arrow/compute/kernels/hash.cc ## @@ -0,0 +1,880 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#include "arrow/compute/kernels/hash.h" + +#include +#include +#include +#include +#include +#include +#include + +#include "arrow/builder.h" +#include "arrow/compute/context.h" +#include "arrow/compute/kernel.h" +#include "arrow/compute/kernels/util-internal.h" +#include "arrow/util/hash-util.h" + +namespace arrow { +namespace compute { + +namespace { + +// Initially 1024 elements +static constexpr int64_t kInitialHashTableSize = 1 << 10; + +typedef int32_t hash_slot_t; +static constexpr hash_slot_t kHashSlotEmpty = std::numeric_limits::max(); + +// The maximum load factor for the hash table before resizing. +static constexpr double kMaxHashTableLoad = 0.7; + +enum class SIMDMode : char { NOSIMD, SSE4, AVX2 }; + +#define CHECK_IMPLEMENTED(KERNEL, FUNCNAME, TYPE) \ + if (!KERNEL) { \ +std::stringstream ss; \ +ss << FUNCNAME << " not implemented for " << type->ToString(); \ +return Status::NotImplemented(ss.str()); \ + } + +Status NewHashTable(int64_t size, MemoryPool* pool, std::shared_ptr* out) { + auto hash_table = std::make_shared(pool); + + RETURN_NOT_OK(hash_table->Resize(sizeof(hash_slot_t) * size)); + int32_t* slots = reinterpret_cast(hash_table->mutable_data()); + std::fill(slots, slots + size, kHashSlotEmpty); + + *out = hash_table; + return Status::OK(); +} + +// This is a slight design concession -- some hash actions have the possibility +// of failure. Rather than introduce extra error checking into all actions, we +// will raise an internal exception so that only the actions where errors can +// occur will experience the extra overhead +class HashException : public std::exception { + public: + explicit HashException(const std::string& msg, StatusCode code = StatusCode::Invalid) + : msg_(msg), code_(code) {} + + ~HashException() throw() {} + + const char* what() const throw() override; + + StatusCode code() const { return code_; } + + private: + std::string msg_; + StatusCode code_; +}; + +const char* HashException::what() const throw() { return msg_.c_str(); } + +class HashTable { + public: + HashTable(const std::shared_ptr& type, MemoryPool* pool) + : type_(type), +pool_(pool), +initialized_(false), +hash_table_(nullptr), +hash_slots_(nullptr), +hash_table_size_(0), +mod_bitmask_(0) {} + + virtual ~HashTable() {} + + virtual Status Append(const ArrayData& input) = 0; + virtual Status Flush(std::vector* out) = 0; + virtual Status GetDictionary(std::shared_ptr* out) = 0; + + protected: + Status Init(int64_t elements); + + std::shared_ptr type_; + MemoryPool* pool_; + bool initialized_; + + // The hash table contains integer indices that reference the set of observed + // distinct values + std::shared_ptr hash_table_; + hash_slot_t* hash_slots_; + + /// Size of the table. Must be a power of 2. + int64_t hash_table_size_; + + // Store hash_table_size_ - 1, so that j & mod_bitmask_ is equivalent to j % + // hash_table_size_, but uses far fewer CPU cycles + int64_t mod_bitmask_; +}; + +Status HashTable::Init(int64_t elements) { + DCHECK_EQ(elements, BitUtil::NextPower2(elements)); + RETURN_NOT_OK(NewHashTable(elements, pool_, _table_)); + hash_slots_ = reinterpret_cast (hash_table_->mutable_data()); + hash_table_size_ = elements; + mod_bitmask_ = elements - 1; + initialized_ = true; + return Status::OK(); +} + +template +class HashTableKernel : public HashTable {}; + +// Types of
[jira] [Assigned] (ARROW-1755) [C++] Add build options for MSVC to use static runtime libraries
[ https://issues.apache.org/jira/browse/ARROW-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Risuhin reassigned ARROW-1755: -- Assignee: Max Risuhin > [C++] Add build options for MSVC to use static runtime libraries > > > Key: ARROW-1755 > URL: https://issues.apache.org/jira/browse/ARROW-1755 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Max Risuhin > -- This message was sent by Atlassian JIRA (v6.4.14#64029)