[jira] [Commented] (ARROW-1832) [JS] Implement JSON reader for integration tests

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262161#comment-16262161
 ] 

ASF GitHub Bot commented on ARROW-1832:
---

trxcllnt commented on a change in pull request #1343: [WIP] ARROW-1832: [JS] 
Implement JSON reader for integration tests
URL: https://github.com/apache/arrow/pull/1343#discussion_r152498412
 
 

 ##
 File path: js/src/reader/arrow.ts
 ##
 @@ -31,6 +31,7 @@ import ByteBuffer = flatbuffers.ByteBuffer;
 import Footer = File_.org.apache.arrow.flatbuf.Footer;
 import Field = Schema_.org.apache.arrow.flatbuf.Field;
 import Schema = Schema_.org.apache.arrow.flatbuf.Schema;
+import Buffer = Schema_.org.apache.arrow.flatbuf.Buffer;
 
 Review comment:
   @TheNeuralBit typo?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Implement JSON reader for integration tests
> 
>
> Key: ARROW-1832
> URL: https://issues.apache.org/jira/browse/ARROW-1832
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
>
> Implementing a JSON reader will allow us to write a "validate" script for the 
> consumer half of the integration tests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1842) ParquetDataset.read(): selectively reading array column

2017-11-22 Thread Young-Jun Ko (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262219#comment-16262219
 ] 

Young-Jun Ko commented on ARROW-1842:
-

Hey Wes,

thanks for the quick reply. In the end addressing the column as 
`c.list.element` worked!
I'll close this as dup.

Best,
yj

> ParquetDataset.read(): selectively reading array column
> ---
>
> Key: ARROW-1842
> URL: https://issues.apache.org/jira/browse/ARROW-1842
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.7.1
>Reporter: Young-Jun Ko
>
> Scenario:
> - created a dataframe in spark and saved it as parquet
> - columns include simple types, e.g. String, but also an array of doubles
> Issue:
> I can read the whole data using ParquetDataset in pyarrow.
> I tried reading selectively a simple type => works
> I tried reading selectively the array column => key error in the following 
> place:
> KeyError: 'c'
> /home/hadoop/Python/lib/python2.7/site-packages/pyarrow/_parquet.pyx in 
> pyarrow._parquet.ParquetReader.column_name_idx 
> (/arrow/python/build/temp.linux-x86_64-2.7/_parquet.cxx:9777)()
> 513 self.column_idx_map[col_bytes] = i
> 514 
> --> 515 return self.column_idx_map[tobytes(column_name)]
> When I just read the whole dataset, I get the correct metadata
> pyarrow.Table
> a: string
> b: string
> c: list
>   child 0, element: double
> d: int64
> metadata
> 
> {'org.apache.spark.sql.parquet.row.metadata': 
> '{"type":"struct","fields":[{"name":"a","type":"string","nullable":true,"metadata":{}},{"name":"b","type":"string","nullable":true,"metadata":{}},{"name":"c","type":{"type":"array","elementType":"double","containsNull":false},"nullable":true,"metadata":{}},{"name":"d","type":"long","nullable":false,"metadata":{}}]}'}
> I might just be missing the correct naming convention of the array column.
> But then this name should be reflected in the metadata.
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (ARROW-1842) ParquetDataset.read(): selectively reading array column

2017-11-22 Thread Young-Jun Ko (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Young-Jun Ko closed ARROW-1842.
---
Resolution: Duplicate

As pointed out in the comments, duplicate of 
https://issues.apache.org/jira/browse/ARROW-1684

> ParquetDataset.read(): selectively reading array column
> ---
>
> Key: ARROW-1842
> URL: https://issues.apache.org/jira/browse/ARROW-1842
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.7.1
>Reporter: Young-Jun Ko
>
> Scenario:
> - created a dataframe in spark and saved it as parquet
> - columns include simple types, e.g. String, but also an array of doubles
> Issue:
> I can read the whole data using ParquetDataset in pyarrow.
> I tried reading selectively a simple type => works
> I tried reading selectively the array column => key error in the following 
> place:
> KeyError: 'c'
> /home/hadoop/Python/lib/python2.7/site-packages/pyarrow/_parquet.pyx in 
> pyarrow._parquet.ParquetReader.column_name_idx 
> (/arrow/python/build/temp.linux-x86_64-2.7/_parquet.cxx:9777)()
> 513 self.column_idx_map[col_bytes] = i
> 514 
> --> 515 return self.column_idx_map[tobytes(column_name)]
> When I just read the whole dataset, I get the correct metadata
> pyarrow.Table
> a: string
> b: string
> c: list
>   child 0, element: double
> d: int64
> metadata
> 
> {'org.apache.spark.sql.parquet.row.metadata': 
> '{"type":"struct","fields":[{"name":"a","type":"string","nullable":true,"metadata":{}},{"name":"b","type":"string","nullable":true,"metadata":{}},{"name":"c","type":{"type":"array","elementType":"double","containsNull":false},"nullable":true,"metadata":{}},{"name":"d","type":"long","nullable":false,"metadata":{}}]}'}
> I might just be missing the correct naming convention of the array column.
> But then this name should be reflected in the metadata.
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1832) [JS] Implement JSON reader for integration tests

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262331#comment-16262331
 ] 

ASF GitHub Bot commented on ARROW-1832:
---

trxcllnt commented on a change in pull request #1343: [WIP] ARROW-1832: [JS] 
Implement JSON reader for integration tests
URL: https://github.com/apache/arrow/pull/1343#discussion_r152535906
 
 

 ##
 File path: js/src/reader/json.ts
 ##
 @@ -0,0 +1,238 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+import { flatbuffers } from 'flatbuffers';
+import { Vector } from '../vector/vector';
+import { TypedArray, TypedArrayConstructor } from '../vector/types';
+import { BinaryVector, BoolVector, Utf8Vector, Int8Vector,
+ Int16Vector, Int32Vector, Int64Vector, Uint8Vector,
+ Uint16Vector, Uint32Vector, Uint64Vector,
+ Float32Vector, Float64Vector, ListVector, StructVector } from 
'../vector/arrow';
+
+import { fb, FieldBuilder, FieldNodeBuilder } from '../format/arrow';
+
+import { TextEncoder } from 'text-encoding-utf-8';
+const encoder = new TextEncoder('utf-8');
+
+export function* readJSON(jsonString: string): IterableIterator[]> 
{
+let obj: any = JSON.parse(jsonString);
+let schema: any = {};
+for (const field of obj.schema.fields) {
+schema[field.name] = field;
+}
+
+for (const batch of obj.batches) {
+yield batch.columns.map((column: any): Vector => 
readVector(schema[column.name], column));
+}
+}
+
+function readVector(field: any, column: any): Vector {
+return readDictionaryVector(field, column) || readValueVector(field, 
column);
+}
+
+function readDictionaryVector(field: any, column: any) {
+if (field.name == column.name) { return null; } else { return null; }
+}
+
+function readValueVector(field: any, column: any): Vector {
+switch (field.type.name) {
+//case "NONE": return readNullVector(field, column);
+//case "null": return readNullVector(field, column);
+//case "map": return readMapVector(field, column);
+case 'int': return readIntVector(field, column);
+case 'bool': return readBoolVector(field, column);
+//case "date": return readDateVector(field, column);
+case 'list': return readListVector(field, column);
+case 'utf8': return readUtf8Vector(field, column);
+//case "time": return readTimeVector(field, column);
+//case "union": return readUnionVector(field, column);
+case 'binary': return readBinaryVector(field, column);
+//case "decimal": return readDecimalVector(field, column);
+case 'struct': return readStructVector(field, column);
+case 'floatingpoint': return readFloatVector(field, column);
+//case "timestamp": return readTimestampVector(field, column);
+//case "fixedsizelist": return readFixedSizeListVector(field, column);
+//case "fixedsizebinary": return readFixedSizeBinaryVector(field, 
column);
+}
+throw new Error(`Unrecognized Vector { name: ${field.name}, type: 
${field.type.name} }`);
+}
+
+function readIntVector(field: any, column: any) {
+if (field.type.isSigned) {
+switch (field.type.bitWidth) {
+case  8: return new  Int8Vector(readNumeric(field, column, 
Int8Array));
+case 16: return new Int16Vector(readNumeric(field, column, 
Int16Array));
+case 32: return new Int32Vector(readNumeric(field, column, 
Int32Array));
+case 64: return new Int64Vector(readInt64(field, column, 
Int32Array));
+}
+}
+switch (field.type.bitWidth) {
+case  8: return new  Uint8Vector(readNumeric(field, column, 
Uint8Array));
+case 16: return new Uint16Vector(readNumeric(field, column, 
Uint16Array));
+case 32: return new Uint32Vector(readNumeric(field, column, 
Uint32Array));
+case 64: return new Uint64Vector(readInt64(field, column, 
Uint32Array));
+}
+throw new Error(`Unrecognized Int { isSigned: ${field.type.isSigned}, 
bitWidth: ${field.type.bitWidth} }`);
+}
+
+function readBoolVector(fieldObj: any, column: any) {
+const field = fieldFromJSO

[jira] [Commented] (ARROW-1832) [JS] Implement JSON reader for integration tests

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262340#comment-16262340
 ] 

ASF GitHub Bot commented on ARROW-1832:
---

trxcllnt commented on a change in pull request #1343: [WIP] ARROW-1832: [JS] 
Implement JSON reader for integration tests
URL: https://github.com/apache/arrow/pull/1343#discussion_r152536620
 
 

 ##
 File path: js/src/reader/json.ts
 ##
 @@ -0,0 +1,238 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+import { flatbuffers } from 'flatbuffers';
+import { Vector } from '../vector/vector';
+import { TypedArray, TypedArrayConstructor } from '../vector/types';
+import { BinaryVector, BoolVector, Utf8Vector, Int8Vector,
+ Int16Vector, Int32Vector, Int64Vector, Uint8Vector,
+ Uint16Vector, Uint32Vector, Uint64Vector,
+ Float32Vector, Float64Vector, ListVector, StructVector } from 
'../vector/arrow';
+
+import { fb, FieldBuilder, FieldNodeBuilder } from '../format/arrow';
+
+import { TextEncoder } from 'text-encoding-utf-8';
+const encoder = new TextEncoder('utf-8');
+
+export function* readJSON(jsonString: string): IterableIterator[]> 
{
+let obj: any = JSON.parse(jsonString);
+let schema: any = {};
+for (const field of obj.schema.fields) {
+schema[field.name] = field;
+}
+
+for (const batch of obj.batches) {
+yield batch.columns.map((column: any): Vector => 
readVector(schema[column.name], column));
+}
+}
+
+function readVector(field: any, column: any): Vector {
+return readDictionaryVector(field, column) || readValueVector(field, 
column);
+}
+
+function readDictionaryVector(field: any, column: any) {
+if (field.name == column.name) { return null; } else { return null; }
+}
+
+function readValueVector(field: any, column: any): Vector {
+switch (field.type.name) {
+//case "NONE": return readNullVector(field, column);
+//case "null": return readNullVector(field, column);
+//case "map": return readMapVector(field, column);
+case 'int': return readIntVector(field, column);
+case 'bool': return readBoolVector(field, column);
+//case "date": return readDateVector(field, column);
+case 'list': return readListVector(field, column);
+case 'utf8': return readUtf8Vector(field, column);
+//case "time": return readTimeVector(field, column);
+//case "union": return readUnionVector(field, column);
+case 'binary': return readBinaryVector(field, column);
+//case "decimal": return readDecimalVector(field, column);
+case 'struct': return readStructVector(field, column);
+case 'floatingpoint': return readFloatVector(field, column);
+//case "timestamp": return readTimestampVector(field, column);
+//case "fixedsizelist": return readFixedSizeListVector(field, column);
+//case "fixedsizebinary": return readFixedSizeBinaryVector(field, 
column);
+}
+throw new Error(`Unrecognized Vector { name: ${field.name}, type: 
${field.type.name} }`);
+}
+
+function readIntVector(field: any, column: any) {
+if (field.type.isSigned) {
+switch (field.type.bitWidth) {
+case  8: return new  Int8Vector(readNumeric(field, column, 
Int8Array));
+case 16: return new Int16Vector(readNumeric(field, column, 
Int16Array));
+case 32: return new Int32Vector(readNumeric(field, column, 
Int32Array));
+case 64: return new Int64Vector(readInt64(field, column, 
Int32Array));
+}
+}
+switch (field.type.bitWidth) {
+case  8: return new  Uint8Vector(readNumeric(field, column, 
Uint8Array));
+case 16: return new Uint16Vector(readNumeric(field, column, 
Uint16Array));
+case 32: return new Uint32Vector(readNumeric(field, column, 
Uint32Array));
+case 64: return new Uint64Vector(readInt64(field, column, 
Uint32Array));
+}
+throw new Error(`Unrecognized Int { isSigned: ${field.type.isSigned}, 
bitWidth: ${field.type.bitWidth} }`);
+}
+
+function readBoolVector(fieldObj: any, column: any) {
+const field = fieldFromJSO

[jira] [Commented] (ARROW-1832) [JS] Implement JSON reader for integration tests

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262339#comment-16262339
 ] 

ASF GitHub Bot commented on ARROW-1832:
---

trxcllnt commented on a change in pull request #1343: [WIP] ARROW-1832: [JS] 
Implement JSON reader for integration tests
URL: https://github.com/apache/arrow/pull/1343#discussion_r152536620
 
 

 ##
 File path: js/src/reader/json.ts
 ##
 @@ -0,0 +1,238 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+import { flatbuffers } from 'flatbuffers';
+import { Vector } from '../vector/vector';
+import { TypedArray, TypedArrayConstructor } from '../vector/types';
+import { BinaryVector, BoolVector, Utf8Vector, Int8Vector,
+ Int16Vector, Int32Vector, Int64Vector, Uint8Vector,
+ Uint16Vector, Uint32Vector, Uint64Vector,
+ Float32Vector, Float64Vector, ListVector, StructVector } from 
'../vector/arrow';
+
+import { fb, FieldBuilder, FieldNodeBuilder } from '../format/arrow';
+
+import { TextEncoder } from 'text-encoding-utf-8';
+const encoder = new TextEncoder('utf-8');
+
+export function* readJSON(jsonString: string): IterableIterator[]> 
{
+let obj: any = JSON.parse(jsonString);
+let schema: any = {};
+for (const field of obj.schema.fields) {
+schema[field.name] = field;
+}
+
+for (const batch of obj.batches) {
+yield batch.columns.map((column: any): Vector => 
readVector(schema[column.name], column));
+}
+}
+
+function readVector(field: any, column: any): Vector {
+return readDictionaryVector(field, column) || readValueVector(field, 
column);
+}
+
+function readDictionaryVector(field: any, column: any) {
+if (field.name == column.name) { return null; } else { return null; }
+}
+
+function readValueVector(field: any, column: any): Vector {
+switch (field.type.name) {
+//case "NONE": return readNullVector(field, column);
+//case "null": return readNullVector(field, column);
+//case "map": return readMapVector(field, column);
+case 'int': return readIntVector(field, column);
+case 'bool': return readBoolVector(field, column);
+//case "date": return readDateVector(field, column);
+case 'list': return readListVector(field, column);
+case 'utf8': return readUtf8Vector(field, column);
+//case "time": return readTimeVector(field, column);
+//case "union": return readUnionVector(field, column);
+case 'binary': return readBinaryVector(field, column);
+//case "decimal": return readDecimalVector(field, column);
+case 'struct': return readStructVector(field, column);
+case 'floatingpoint': return readFloatVector(field, column);
+//case "timestamp": return readTimestampVector(field, column);
+//case "fixedsizelist": return readFixedSizeListVector(field, column);
+//case "fixedsizebinary": return readFixedSizeBinaryVector(field, 
column);
+}
+throw new Error(`Unrecognized Vector { name: ${field.name}, type: 
${field.type.name} }`);
+}
+
+function readIntVector(field: any, column: any) {
+if (field.type.isSigned) {
+switch (field.type.bitWidth) {
+case  8: return new  Int8Vector(readNumeric(field, column, 
Int8Array));
+case 16: return new Int16Vector(readNumeric(field, column, 
Int16Array));
+case 32: return new Int32Vector(readNumeric(field, column, 
Int32Array));
+case 64: return new Int64Vector(readInt64(field, column, 
Int32Array));
+}
+}
+switch (field.type.bitWidth) {
+case  8: return new  Uint8Vector(readNumeric(field, column, 
Uint8Array));
+case 16: return new Uint16Vector(readNumeric(field, column, 
Uint16Array));
+case 32: return new Uint32Vector(readNumeric(field, column, 
Uint32Array));
+case 64: return new Uint64Vector(readInt64(field, column, 
Uint32Array));
+}
+throw new Error(`Unrecognized Int { isSigned: ${field.type.isSigned}, 
bitWidth: ${field.type.bitWidth} }`);
+}
+
+function readBoolVector(fieldObj: any, column: any) {
+const field = fieldFromJSO

[jira] [Commented] (ARROW-1832) [JS] Implement JSON reader for integration tests

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262346#comment-16262346
 ] 

ASF GitHub Bot commented on ARROW-1832:
---

trxcllnt commented on a change in pull request #1343: [WIP] ARROW-1832: [JS] 
Implement JSON reader for integration tests
URL: https://github.com/apache/arrow/pull/1343#discussion_r152537314
 
 

 ##
 File path: js/src/format/arrow.ts
 ##
 @@ -0,0 +1,61 @@
+import { flatbuffers } from 'flatbuffers';
+
+import * as Schema_ from './Schema';
+import * as Message_ from './Message';
+import * as File_ from './Message';
+
+export namespace fb {
+export import Schema = Schema_.org.apache.arrow.flatbuf;
+export import Message = Message_.org.apache.arrow.flatbuf;
+export import File = File_.org.apache.arrow.flatbuf;
+}
 
 Review comment:
   @TheNeuralBit ah I misunderstood how you did these exports. In order to get 
closure compiler to work, we have to put the generated flatbuffers code in its 
own folder. We can move the generated files to `format/fb` and just re-export 
them from this file like you're doing here. I've got this working now in a 
branch, want me to PR?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Implement JSON reader for integration tests
> 
>
> Key: ARROW-1832
> URL: https://issues.apache.org/jira/browse/ARROW-1832
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
>
> Implementing a JSON reader will allow us to write a "validate" script for the 
> consumer half of the integration tests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1832) [JS] Implement JSON reader for integration tests

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262352#comment-16262352
 ] 

ASF GitHub Bot commented on ARROW-1832:
---

trxcllnt commented on a change in pull request #1343: [WIP] ARROW-1832: [JS] 
Implement JSON reader for integration tests
URL: https://github.com/apache/arrow/pull/1343#discussion_r152537908
 
 

 ##
 File path: js/src/vector/traits.ts
 ##
 @@ -17,11 +17,26 @@
 
 import { Vector } from './vector';
 import { BoolVector } from './numeric';
-import * as Schema_ from '../format/Schema';
-import * as Message_ from '../format/Message';
-import Type = Schema_.org.apache.arrow.flatbuf.Type;
-import Field = Schema_.org.apache.arrow.flatbuf.Field;
-import FieldNode = Message_.org.apache.arrow.flatbuf.FieldNode;
+import { fb, FieldBuilder, FieldNodeBuilder } from '../format/arrow';
+
+export type Field = ( fb.Schema.Field | FieldBuilder );
+export type FieldNode = ( fb.Message.FieldNode | FieldNodeBuilder );
 
 Review comment:
   This is great! I love it


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Implement JSON reader for integration tests
> 
>
> Key: ARROW-1832
> URL: https://issues.apache.org/jira/browse/ARROW-1832
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
>
> Implementing a JSON reader will allow us to write a "validate" script for the 
> consumer half of the integration tests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1832) [JS] Implement JSON reader for integration tests

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262414#comment-16262414
 ] 

ASF GitHub Bot commented on ARROW-1832:
---

trxcllnt commented on a change in pull request #1343: [WIP] ARROW-1832: [JS] 
Implement JSON reader for integration tests
URL: https://github.com/apache/arrow/pull/1343#discussion_r152544796
 
 

 ##
 File path: js/src/format/arrow.ts
 ##
 @@ -0,0 +1,61 @@
+import { flatbuffers } from 'flatbuffers';
+
+import * as Schema_ from './Schema';
+import * as Message_ from './Message';
+import * as File_ from './Message';
+
+export namespace fb {
+export import Schema = Schema_.org.apache.arrow.flatbuf;
+export import Message = Message_.org.apache.arrow.flatbuf;
+export import File = File_.org.apache.arrow.flatbuf;
+}
 
 Review comment:
   Ah, actually I spoke too soon. The way TS compiles namespaces to nested 
IEFEs in JS confuses Closure Compiler's mangler, so all the `fb.Schema.Foo` 
references get mangled to different names. I remember figuring this out when I 
first turned on CC, and that's why we do the ugly `import * as Schema_` and 
`import Type = Schema_.org.apache.arrow.flatbuf.Type` nonsense everywhere. Ugh.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Implement JSON reader for integration tests
> 
>
> Key: ARROW-1832
> URL: https://issues.apache.org/jira/browse/ARROW-1832
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
>
> Implementing a JSON reader will allow us to write a "validate" script for the 
> consumer half of the integration tests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1782) [Python] Expose compressors as pyarrow.compress, pyarrow.decompress

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262504#comment-16262504
 ] 

ASF GitHub Bot commented on ARROW-1782:
---

xhochy commented on a change in pull request #1345: ARROW-1782: [Python] Add 
pyarrow.compress, decompress APIs
URL: https://github.com/apache/arrow/pull/1345#discussion_r152561225
 
 

 ##
 File path: cpp/src/arrow/util/compression.h
 ##
 @@ -27,7 +27,7 @@
 namespace arrow {
 
 struct Compression {
-  enum type { UNCOMPRESSED, SNAPPY, GZIP, LZO, BROTLI, ZSTD, LZ4 };
+  enum type { UNCOMPRESSED, SNAPPY, GZIP, BROTLI, ZSTD, LZ4, LZO };
 
 Review comment:
   It is part of the Parquet standard, so Parquet will need it (altough we 
don't support it at the moment). I'm not aware how widly it is used, yet no 
user complained.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Expose compressors as pyarrow.compress, pyarrow.decompress
> ---
>
> Key: ARROW-1782
> URL: https://issues.apache.org/jira/browse/ARROW-1782
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> These should release the GIL, and serve as an alternative to the various 
> compressor wrapper libraries out there. They should have the ability to work 
> with {{pyarrow.Buffer}} or {{PyBytes}} as the user prefers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1703) [C++] Vendor exact version of jemalloc we depend on

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262529#comment-16262529
 ] 

ASF GitHub Bot commented on ARROW-1703:
---

xhochy commented on issue #1334: ARROW-1703: [C++] Vendor exact version of 
jemalloc we depend on
URL: https://github.com/apache/arrow/pull/1334#issuecomment-346354979
 
 
   I don't expect that we will update `jemalloc` often. Hopefully, we can use 
the stable 5.x release series soon and then use official builds everywhere.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Vendor exact version of jemalloc we depend on
> ---
>
> Key: ARROW-1703
> URL: https://issues.apache.org/jira/browse/ARROW-1703
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Uwe L. Korn
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Since we are likely going to be using a patched jemalloc, we probably should 
> not support using jemalloc with any other version, or relying on system 
> packages. jemalloc would therefore always be built together with Arrow if 
> {{ARROW_JEMALLOC}} is on
> For this reason I believe we should vendor the code at the pinned commit as 
> with Redis and other projects: 
> https://github.com/antirez/redis/tree/unstable/deps



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1577) [JS] Package release script for NPM modules

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262721#comment-16262721
 ] 

ASF GitHub Bot commented on ARROW-1577:
---

wesm commented on issue #1346: ARROW-1577: [JS] add ASF release scripts
URL: https://github.com/apache/arrow/pull/1346#issuecomment-346371702
 
 
   In

   ```
   # available at either
   npm install apache-arrow@0.1.3
   # or 
   npm install apache-arrow@v0.8.0
   ```
   
   I am concerned this might confuse more than help. As long as two 
implementations are using the same metadata version (e.g. V4), then the older 
metadata is supposed to be forward compatible. So an older JS version that uses 
V4 would be compatible with versions of the other Arrow libraries other than 
0.8.0. Is there somewhere else where we can document the version 
cross-compatibility?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Package release script for NPM modules
> ---
>
> Key: ARROW-1577
> URL: https://issues.apache.org/jira/browse/ARROW-1577
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Affects Versions: 0.8.0
>Reporter: Wes McKinney
>Assignee: Paul Taylor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Since the NPM JavaScript module may wish to release more frequently than the 
> main Arrow "monorepo", we should create a script to produce signed NPM 
> artifacts to use for voting:
> * Update metadata for new version
> * Run unit tests
> * Create package tarballs with NPM
> * GPG sign and create md5 and sha512 checksum files
> * Upload to Apache dev SVN
> i.e. like 
> https://github.com/apache/arrow/blob/master/dev/release/02-source.sh, but 
> only for JavaScript.
> We will also want to write instructions for Arrow developers to verify the 
> tarballs to streamline the release votes



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1577) [JS] Package release script for NPM modules

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262722#comment-16262722
 ] 

ASF GitHub Bot commented on ARROW-1577:
---

wesm commented on issue #1346: ARROW-1577: [JS] add ASF release scripts
URL: https://github.com/apache/arrow/pull/1346#issuecomment-346371702
 
 
   In

   ```
   # available at either
   npm install apache-arrow@0.1.3
   # or 
   npm install apache-arrow@v0.8.0
   ```
   
   I am concerned this might confuse more than help. As long as two 
implementations are using the same metadata version (e.g. V4), then the older 
metadata is supposed to be forward / backward compatible. So an older JS 
version that uses V4 would be compatible with versions of the other Arrow 
libraries other than 0.8.0. Is there somewhere else where we can document the 
version cross-compatibility?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Package release script for NPM modules
> ---
>
> Key: ARROW-1577
> URL: https://issues.apache.org/jira/browse/ARROW-1577
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Affects Versions: 0.8.0
>Reporter: Wes McKinney
>Assignee: Paul Taylor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Since the NPM JavaScript module may wish to release more frequently than the 
> main Arrow "monorepo", we should create a script to produce signed NPM 
> artifacts to use for voting:
> * Update metadata for new version
> * Run unit tests
> * Create package tarballs with NPM
> * GPG sign and create md5 and sha512 checksum files
> * Upload to Apache dev SVN
> i.e. like 
> https://github.com/apache/arrow/blob/master/dev/release/02-source.sh, but 
> only for JavaScript.
> We will also want to write instructions for Arrow developers to verify the 
> tarballs to streamline the release votes



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1577) [JS] Package release script for NPM modules

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262767#comment-16262767
 ] 

ASF GitHub Bot commented on ARROW-1577:
---

wesm commented on issue #1346: ARROW-1577: [JS] add ASF release scripts
URL: https://github.com/apache/arrow/pull/1346#issuecomment-346379338
 
 
   There are some missing steps in running the source release, can you let me 
know what I need to do?
   
   * Installed node with nvm `nvm install node`
   * Tried running dev/release/js-source-release.sh, failed due to missing gulp
   * Ran `npm install --global gulp`, then tried the script again, failed with:
   
   ```
   Preparing source for tag apache-arrow-js-0.2.0
   Using commit 6beed6abf3d288511a0766bf51a255842290032f
   
   > apache-arrow@0.2.0 clean:testdata /home/wesm/code/arrow/js
   > gulp clean:testdata
   
   module.js:544
   throw err;
   ^
   
   Error: Cannot find module 'del'
   at Function.Module._resolveFilename (module.js:542:15)
   at Function.Module._load (module.js:472:25)
   at Module.require (module.js:585:17)
   at require (internal/module.js:11:18)
   at Object. (/home/wesm/code/arrow/js/gulpfile.js:18:13)
   at Module._compile (module.js:641:30)
   at Object.Module._extensions..js (module.js:652:10)
   at Module.load (module.js:560:32)
   at tryModuleLoad (module.js:503:12)
   at Function.Module._load (module.js:495:3)
   npm ERR! code ELIFECYCLE
   npm ERR! errno 1
   npm ERR! apache-arrow@0.2.0 clean:testdata: `gulp clean:testdata`
   npm ERR! Exit status 1
   npm ERR! 
   npm ERR! Failed at the apache-arrow@0.2.0 clean:testdata script.
   npm ERR! This is probably not a problem with npm. There is likely additional 
logging output above.
   npm WARN Local package.json exists, but node_modules missing, did you mean 
to install?
   
   npm ERR! A complete log of this run can be found in:
   npm ERR! /home/wesm/.npm/_logs/2017-11-22T15_10_58_135Z-debug.log
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Package release script for NPM modules
> ---
>
> Key: ARROW-1577
> URL: https://issues.apache.org/jira/browse/ARROW-1577
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Affects Versions: 0.8.0
>Reporter: Wes McKinney
>Assignee: Paul Taylor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Since the NPM JavaScript module may wish to release more frequently than the 
> main Arrow "monorepo", we should create a script to produce signed NPM 
> artifacts to use for voting:
> * Update metadata for new version
> * Run unit tests
> * Create package tarballs with NPM
> * GPG sign and create md5 and sha512 checksum files
> * Upload to Apache dev SVN
> i.e. like 
> https://github.com/apache/arrow/blob/master/dev/release/02-source.sh, but 
> only for JavaScript.
> We will also want to write instructions for Arrow developers to verify the 
> tarballs to streamline the release votes



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262773#comment-16262773
 ] 

ASF GitHub Bot commented on ARROW-1710:
---

icexelloss commented on a change in pull request #1341: [WIP] ARROW-1710: 
[Java] Remove Non-Nullable Vectors
URL: https://github.com/apache/arrow/pull/1341#discussion_r152593713
 
 

 ##
 File path: java/vector/src/main/java/org/apache/arrow/vector/BitVector.java
 ##
 @@ -18,342 +18,469 @@
 
 package org.apache.arrow.vector;
 
+import io.netty.buffer.ArrowBuf;
 import org.apache.arrow.memory.BufferAllocator;
-import org.apache.arrow.memory.BaseAllocator;
-import org.apache.arrow.memory.OutOfMemoryException;
+import org.apache.arrow.vector.complex.impl.BitReaderImpl;
 import org.apache.arrow.vector.complex.reader.FieldReader;
 import org.apache.arrow.vector.holders.BitHolder;
 import org.apache.arrow.vector.holders.NullableBitHolder;
-import org.apache.arrow.vector.schema.ArrowFieldNode;
-import org.apache.arrow.vector.types.Types.MinorType;
-import org.apache.arrow.vector.types.pojo.Field;
+import org.apache.arrow.vector.types.Types;
+import org.apache.arrow.vector.types.pojo.FieldType;
 import org.apache.arrow.vector.util.OversizedAllocationException;
 import org.apache.arrow.vector.util.TransferPair;
 
-import io.netty.buffer.ArrowBuf;
-
 /**
- * Bit implements a vector of bit-width values. Elements in the vector are 
accessed by position from the logical start
- * of the vector. The width of each element is 1 bit. The equivalent Java 
primitive is an int containing the value '0'
- * or '1'.
+ * BitVector implements a fixed width (1 bit) vector of
+ * boolean values which could be null. Each value in the vector corresponds
+ * to a single bit in the underlying data stream backing the vector.
  */
-public final class BitVector extends BaseDataValueVector implements 
FixedWidthVector {
-  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(BitVector.class);
-
-  private final Accessor accessor = new Accessor();
-  private final Mutator mutator = new Mutator();
-
-  int valueCount;
-  private int allocationSizeInBytes = 
getSizeFromCount(INITIAL_VALUE_ALLOCATION);
-  private int allocationMonitor = 0;
+public class BitVector extends BaseFixedWidthVector {
+  private final FieldReader reader;
 
+  /**
+   * Instantiate a BitVector. This doesn't allocate any memory for
+   * the data in vector.
+   *
+   * @param name  name of the vector
+   * @param allocator allocator for memory management.
+   */
   public BitVector(String name, BufferAllocator allocator) {
-super(name, allocator);
+this(name, FieldType.nullable(Types.MinorType.BIT.getType()),
+allocator);
   }
 
-  @Override
-  public void load(ArrowFieldNode fieldNode, ArrowBuf data) {
-// When the vector is all nulls or all defined, the content of the buffer 
can be omitted
-if (data.readableBytes() == 0 && fieldNode.getLength() != 0) {
-  int count = fieldNode.getLength();
-  allocateNew(count);
-  int n = getSizeFromCount(count);
-  if (fieldNode.getNullCount() == 0) {
-// all defined
-// create an all 1s buffer
-// set full bytes
-int fullBytesCount = count / 8;
-for (int i = 0; i < fullBytesCount; ++i) {
-  this.data.setByte(i, 0xFF);
-}
-int remainder = count % 8;
-// set remaining bits
-if (remainder > 0) {
-  byte bitMask = (byte) (0xFFL >>> ((8 - remainder) & 7));
-  this.data.setByte(fullBytesCount, bitMask);
-}
-  } else if (fieldNode.getNullCount() == fieldNode.getLength()) {
-// all null
-// create an all 0s buffer
-zeroVector();
-  } else {
-throw new IllegalArgumentException("The buffer can be empty only if 
there's no data or it's all null or all defined");
-  }
-  this.data.writerIndex(n);
-} else {
-  super.load(fieldNode, data);
-}
-this.valueCount = fieldNode.getLength();
+  /**
+   * Instantiate a BitVector. This doesn't allocate any memory for
+   * the data in vector.
+   *
+   * @param name  name of the vector
+   * @param fieldType type of Field materialized by this vector
+   * @param allocator allocator for memory management.
+   */
+  public BitVector(String name, FieldType fieldType, BufferAllocator 
allocator) {
+super(name, allocator, fieldType, (byte) 0);
+reader = new BitReaderImpl(BitVector.this);
   }
 
+  /**
+   * Get a reader that supports reading values from this vector
+   *
+   * @return Field Reader for this vector
+   */
   @Override
-  public Field getField() {
-throw new UnsupportedOperationException("internal vector");
+  public FieldReader getReader() {
+return reader;
   }
 
+  /**
+   * Get minor type for this vector. The vector holds values belonging
+   * to a particular type.
+   *

[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262789#comment-16262789
 ] 

ASF GitHub Bot commented on ARROW-1710:
---

icexelloss commented on a change in pull request #1341: [WIP] ARROW-1710: 
[Java] Remove Non-Nullable Vectors
URL: https://github.com/apache/arrow/pull/1341#discussion_r152596476
 
 

 ##
 File path: java/vector/src/test/java/org/apache/arrow/vector/TestBitVector.java
 ##
 @@ -426,81 +405,81 @@ public void testReallocAfterVectorTransfer2() {
   public void testBitVector() {
 // Create a new value vector for 1024 integers
 try (final BitVector vector = new BitVector(EMPTY_SCHEMA_PATH, allocator)) 
{
-  final BitVector.Mutator m = vector.getMutator();
   vector.allocateNew(1024);
-  m.setValueCount(1024);
+  vector.setValueCount(1024);
 
   // Put and set a few values
-  m.set(0, 1);
-  m.set(1, 0);
-  m.set(100, 0);
-  m.set(1022, 1);
+  vector.set(0, 1);
+  vector.set(1, 0);
+  vector.set(100, 0);
+  vector.set(1022, 1);
 
-  m.setValueCount(1024);
+  vector.setValueCount(1024);
 
-  final BitVector.Accessor accessor = vector.getAccessor();
-  assertEquals(1, accessor.get(0));
-  assertEquals(0, accessor.get(1));
-  assertEquals(0, accessor.get(100));
-  assertEquals(1, accessor.get(1022));
+  assertEquals(1, vector.get(0));
+  assertEquals(0, vector.get(1));
+  assertEquals(0, vector.get(100));
+  assertEquals(1, vector.get(1022));
 
-  assertEquals(1022, accessor.getNullCount());
+  assertEquals(1020, vector.getNullCount());
 
 Review comment:
   For migration purpose, should me provide an alternative method to provide 
the old behavior? Something like `getNullAndUnsetCount()`. Seems like a weird 
API though.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Decide what to do with non-nullable vectors in new vector class 
> hierarchy 
> -
>
> Key: ARROW-1710
> URL: https://issues.apache.org/jira/browse/ARROW-1710
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java - Vectors
>Reporter: Li Jin
>Assignee: Bryan Cutler
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> So far the consensus seems to be remove all non-nullable vectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262788#comment-16262788
 ] 

ASF GitHub Bot commented on ARROW-1710:
---

icexelloss commented on a change in pull request #1341: [WIP] ARROW-1710: 
[Java] Remove Non-Nullable Vectors
URL: https://github.com/apache/arrow/pull/1341#discussion_r152596476
 
 

 ##
 File path: java/vector/src/test/java/org/apache/arrow/vector/TestBitVector.java
 ##
 @@ -426,81 +405,81 @@ public void testReallocAfterVectorTransfer2() {
   public void testBitVector() {
 // Create a new value vector for 1024 integers
 try (final BitVector vector = new BitVector(EMPTY_SCHEMA_PATH, allocator)) 
{
-  final BitVector.Mutator m = vector.getMutator();
   vector.allocateNew(1024);
-  m.setValueCount(1024);
+  vector.setValueCount(1024);
 
   // Put and set a few values
-  m.set(0, 1);
-  m.set(1, 0);
-  m.set(100, 0);
-  m.set(1022, 1);
+  vector.set(0, 1);
+  vector.set(1, 0);
+  vector.set(100, 0);
+  vector.set(1022, 1);
 
-  m.setValueCount(1024);
+  vector.setValueCount(1024);
 
-  final BitVector.Accessor accessor = vector.getAccessor();
-  assertEquals(1, accessor.get(0));
-  assertEquals(0, accessor.get(1));
-  assertEquals(0, accessor.get(100));
-  assertEquals(1, accessor.get(1022));
+  assertEquals(1, vector.get(0));
+  assertEquals(0, vector.get(1));
+  assertEquals(0, vector.get(100));
+  assertEquals(1, vector.get(1022));
 
-  assertEquals(1022, accessor.getNullCount());
+  assertEquals(1020, vector.getNullCount());
 
 Review comment:
   For migration purpose, should me provide an alternative method to provide 
the old behavior? Something like `getNullAndUnsetCount()`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Decide what to do with non-nullable vectors in new vector class 
> hierarchy 
> -
>
> Key: ARROW-1710
> URL: https://issues.apache.org/jira/browse/ARROW-1710
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java - Vectors
>Reporter: Li Jin
>Assignee: Bryan Cutler
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> So far the consensus seems to be remove all non-nullable vectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262792#comment-16262792
 ] 

ASF GitHub Bot commented on ARROW-1710:
---

icexelloss commented on a change in pull request #1341: [WIP] ARROW-1710: 
[Java] Remove Non-Nullable Vectors
URL: https://github.com/apache/arrow/pull/1341#discussion_r152597645
 
 

 ##
 File path: java/vector/src/test/java/org/apache/arrow/vector/TestBitVector.java
 ##
 @@ -526,15 +505,17 @@ private void validateRange(int length, int start, int 
count) {
 try (BitVector bitVector = new BitVector("bits", allocator)) {
   bitVector.reset();
   bitVector.allocateNew(length);
-  bitVector.getMutator().setRangeToOne(start, count);
+  for (int i = start; i < start + count; i++) {
+bitVector.set(i, 1);
+  }
 
 Review comment:
   Should we consider add `setRangeToOne` back in `BitVector`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Decide what to do with non-nullable vectors in new vector class 
> hierarchy 
> -
>
> Key: ARROW-1710
> URL: https://issues.apache.org/jira/browse/ARROW-1710
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java - Vectors
>Reporter: Li Jin
>Assignee: Bryan Cutler
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> So far the consensus seems to be remove all non-nullable vectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262798#comment-16262798
 ] 

ASF GitHub Bot commented on ARROW-1710:
---

icexelloss commented on a change in pull request #1341: [WIP] ARROW-1710: 
[Java] Remove Non-Nullable Vectors
URL: https://github.com/apache/arrow/pull/1341#discussion_r152598542
 
 

 ##
 File path: 
java/vector/src/test/java/org/apache/arrow/vector/TestOversizedAllocationForValueVector.java
 ##
 @@ -112,10 +112,10 @@ public void testVariableVectorReallocation() {
 try {
   vector.allocateNew(expectedAllocationInBytes, 10);
   assertTrue(expectedOffsetSize <= vector.getValueCapacity());
-  assertTrue(expectedAllocationInBytes <= vector.getBuffer().capacity());
+  assertTrue(expectedAllocationInBytes <= 
vector.getDataBuffer().capacity());
 
 Review comment:
   Why this is checking DataBuffer and the check after reAlloc is checking 
OffsetBuffer?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Decide what to do with non-nullable vectors in new vector class 
> hierarchy 
> -
>
> Key: ARROW-1710
> URL: https://issues.apache.org/jira/browse/ARROW-1710
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java - Vectors
>Reporter: Li Jin
>Assignee: Bryan Cutler
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> So far the consensus seems to be remove all non-nullable vectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262803#comment-16262803
 ] 

ASF GitHub Bot commented on ARROW-1710:
---

icexelloss commented on issue #1341: [WIP] ARROW-1710: [Java] Remove 
Non-Nullable Vectors
URL: https://github.com/apache/arrow/pull/1341#issuecomment-346386518
 
 
   @BryanCutler High level looks good to me. Two major comments:
   * Do we want to port the missing function (`getNullCount()` and 
`setRangeToOne`) to the new bit vector?
   * Can we remove `NonNullableMapVector` altogether? (git rid of the 
`MapVector extends NonNullableMapVector` inherence and roll them into a single 
class )
   
   Happy thanksgiving!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Decide what to do with non-nullable vectors in new vector class 
> hierarchy 
> -
>
> Key: ARROW-1710
> URL: https://issues.apache.org/jira/browse/ARROW-1710
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java - Vectors
>Reporter: Li Jin
>Assignee: Bryan Cutler
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> So far the consensus seems to be remove all non-nullable vectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1816) [Java] Resolve new vector classes structure for timestamp, date and maybe interval

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262805#comment-16262805
 ] 

ASF GitHub Bot commented on ARROW-1816:
---

icexelloss commented on issue #1330: ARROW-1816: [Java] Resolve new vector 
classes structure for timestamp, date and maybe interval 
URL: https://github.com/apache/arrow/pull/1330#issuecomment-346236125
 
 
   @BryanCutler I am a bit reluctant to check `unit` and `timezone` in  value 
holders because of performance reasons. This is currently not checked with 
other type with type params either, such as decimal. We should maybe visit this 
problem as a whole as follow up?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Resolve new vector classes structure for timestamp, date and maybe 
> interval
> --
>
> Key: ARROW-1816
> URL: https://issues.apache.org/jira/browse/ARROW-1816
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Li Jin
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Personally I think having 8 vector classes for timestamps is not great. This 
> is discussed at some point during the PR:
> https://github.com/apache/arrow/pull/1203#discussion_r145241388



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (ARROW-1758) [Python] Remove pickle=True option for object serialization

2017-11-22 Thread Licht Takeuchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Licht Takeuchi reassigned ARROW-1758:
-

Assignee: Licht Takeuchi

> [Python] Remove pickle=True option for object serialization
> ---
>
> Key: ARROW-1758
> URL: https://issues.apache.org/jira/browse/ARROW-1758
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Philipp Moritz
>Assignee: Licht Takeuchi
> Fix For: 0.8.0
>
>
> As pointed out in 
> https://github.com/apache/arrow/pull/1272#issuecomment-340738439, we don't 
> really need this option, it can already be done with pickle.dumps as the 
> custom serializer and pickle.loads as the deserializer.
> This has the additional benefit that it will be very clear to the user which 
> pickler will be used and the user can use a custom pickler easily.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1758) [Python] Remove pickle=True option for object serialization

2017-11-22 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-1758:
--
Labels: pull-request-available  (was: )

> [Python] Remove pickle=True option for object serialization
> ---
>
> Key: ARROW-1758
> URL: https://issues.apache.org/jira/browse/ARROW-1758
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Philipp Moritz
>Assignee: Licht Takeuchi
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> As pointed out in 
> https://github.com/apache/arrow/pull/1272#issuecomment-340738439, we don't 
> really need this option, it can already be done with pickle.dumps as the 
> custom serializer and pickle.loads as the deserializer.
> This has the additional benefit that it will be very clear to the user which 
> pickler will be used and the user can use a custom pickler easily.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1758) [Python] Remove pickle=True option for object serialization

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262867#comment-16262867
 ] 

ASF GitHub Bot commented on ARROW-1758:
---

Licht-T opened a new pull request #1347: ARROW-1758: [Python] Remove 
pickle=True option for object serialization
URL: https://github.com/apache/arrow/pull/1347
 
 
   This closes [ARROW-1758](https://issues.apache.org/jira/browse/ARROW-1758).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Remove pickle=True option for object serialization
> ---
>
> Key: ARROW-1758
> URL: https://issues.apache.org/jira/browse/ARROW-1758
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Philipp Moritz
>Assignee: Licht Takeuchi
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> As pointed out in 
> https://github.com/apache/arrow/pull/1272#issuecomment-340738439, we don't 
> really need this option, it can already be done with pickle.dumps as the 
> custom serializer and pickle.loads as the deserializer.
> This has the additional benefit that it will be very clear to the user which 
> pickler will be used and the user can use a custom pickler easily.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262873#comment-16262873
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

BryanCutler commented on a change in pull request #1259: ARROW-1047: [Java] Add 
Generic Reader Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#discussion_r152614571
 
 

 ##
 File path: java/tools/src/main/java/org/apache/arrow/tools/EchoServer.java
 ##
 @@ -23,8 +23,8 @@
 import org.apache.arrow.memory.BufferAllocator;
 import org.apache.arrow.memory.RootAllocator;
 import org.apache.arrow.vector.VectorSchemaRoot;
-import org.apache.arrow.vector.stream.ArrowStreamReader;
-import org.apache.arrow.vector.stream.ArrowStreamWriter;
+import org.apache.arrow.vector.ipc.stream.ArrowStreamReader;
+import org.apache.arrow.vector.ipc.stream.ArrowStreamWriter;
 
 Review comment:
   Sure, I'm fine with this.  I'll change it now


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (ARROW-1831) [Python] Docker-based documentation build does not properly set LD_LIBRARY_PATH

2017-11-22 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-1831.

Resolution: Invalid

This was not the root cause of the problems.

> [Python] Docker-based documentation build does not properly set 
> LD_LIBRARY_PATH
> ---
>
> Key: ARROW-1831
> URL: https://issues.apache.org/jira/browse/ARROW-1831
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Uwe L. Korn
> Fix For: 0.8.0
>
>
> see https://github.com/apache/arrow/issues/1324



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1816) [Java] Resolve new vector classes structure for timestamp, date and maybe interval

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262907#comment-16262907
 ] 

ASF GitHub Bot commented on ARROW-1816:
---

BryanCutler commented on issue #1330: ARROW-1816: [Java] Resolve new vector 
classes structure for timestamp, date and maybe interval
URL: https://github.com/apache/arrow/pull/1330#issuecomment-346408827
 
 
   > @BryanCutler I am a bit reluctant to check unit and timezone in value 
holders because of performance reasons.
   
   Yeah it doesn't make sense to do all these checks each access, so I just 
wanted to pose the question to make sure it wasn't a blocker for doing this 
refactor.  I don't use the holder APIs so it's fine with me but maybe 
@siddharthteotia has some thoughts on this?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Resolve new vector classes structure for timestamp, date and maybe 
> interval
> --
>
> Key: ARROW-1816
> URL: https://issues.apache.org/jira/browse/ARROW-1816
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Li Jin
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Personally I think having 8 vector classes for timestamps is not great. This 
> is discussed at some point during the PR:
> https://github.com/apache/arrow/pull/1203#discussion_r145241388



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262925#comment-16262925
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

icexelloss commented on issue #1259: ARROW-1047: [Java] Add Generic Reader 
Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#issuecomment-346415200
 
 
   LGTM. +1


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1816) [Java] Resolve new vector classes structure for timestamp, date and maybe interval

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262940#comment-16262940
 ] 

ASF GitHub Bot commented on ARROW-1816:
---

BryanCutler commented on a change in pull request #1330: ARROW-1816: [Java] 
Resolve new vector classes structure for timestamp, date and maybe interval 
URL: https://github.com/apache/arrow/pull/1330#discussion_r152630009
 
 

 ##
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/types/TimeUnit.java
 ##
 @@ -19,10 +19,10 @@
 package org.apache.arrow.vector.types;
 
 public enum TimeUnit {
-  SECOND(org.apache.arrow.flatbuf.TimeUnit.SECOND),
-  MILLISECOND(org.apache.arrow.flatbuf.TimeUnit.MILLISECOND),
-  MICROSECOND(org.apache.arrow.flatbuf.TimeUnit.MICROSECOND),
-  NANOSECOND(org.apache.arrow.flatbuf.TimeUnit.NANOSECOND);
+  SECOND(org.apache.arrow.flatbuf.TimeUnit.SECOND, 
java.util.concurrent.TimeUnit.SECONDS),
+  MILLISECOND(org.apache.arrow.flatbuf.TimeUnit.MILLISECOND, 
java.util.concurrent.TimeUnit.MILLISECONDS),
+  MICROSECOND(org.apache.arrow.flatbuf.TimeUnit.MICROSECOND, 
java.util.concurrent.TimeUnit.MICROSECONDS),
+  NANOSECOND(org.apache.arrow.flatbuf.TimeUnit.NANOSECOND, 
java.util.concurrent.TimeUnit.NANOSECONDS);
 
 Review comment:
   Well before the enum was defined by the flatbufId, but here it is the 
flatbufId and the Java TimeUnit which is redundant information.  So a check for 
equality would have to look at both of these fields.  Maybe not a big deal 
though..


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Resolve new vector classes structure for timestamp, date and maybe 
> interval
> --
>
> Key: ARROW-1816
> URL: https://issues.apache.org/jira/browse/ARROW-1816
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Li Jin
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Personally I think having 8 vector classes for timestamps is not great. This 
> is discussed at some point during the PR:
> https://github.com/apache/arrow/pull/1203#discussion_r145241388



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262948#comment-16262948
 ] 

ASF GitHub Bot commented on ARROW-1710:
---

BryanCutler commented on a change in pull request #1341: [WIP] ARROW-1710: 
[Java] Remove Non-Nullable Vectors
URL: https://github.com/apache/arrow/pull/1341#discussion_r152630603
 
 

 ##
 File path: java/vector/src/main/java/org/apache/arrow/vector/BitVector.java
 ##
 @@ -18,342 +18,469 @@
 
 package org.apache.arrow.vector;
 
+import io.netty.buffer.ArrowBuf;
 import org.apache.arrow.memory.BufferAllocator;
-import org.apache.arrow.memory.BaseAllocator;
-import org.apache.arrow.memory.OutOfMemoryException;
+import org.apache.arrow.vector.complex.impl.BitReaderImpl;
 import org.apache.arrow.vector.complex.reader.FieldReader;
 import org.apache.arrow.vector.holders.BitHolder;
 import org.apache.arrow.vector.holders.NullableBitHolder;
-import org.apache.arrow.vector.schema.ArrowFieldNode;
-import org.apache.arrow.vector.types.Types.MinorType;
-import org.apache.arrow.vector.types.pojo.Field;
+import org.apache.arrow.vector.types.Types;
+import org.apache.arrow.vector.types.pojo.FieldType;
 import org.apache.arrow.vector.util.OversizedAllocationException;
 import org.apache.arrow.vector.util.TransferPair;
 
-import io.netty.buffer.ArrowBuf;
-
 /**
- * Bit implements a vector of bit-width values. Elements in the vector are 
accessed by position from the logical start
- * of the vector. The width of each element is 1 bit. The equivalent Java 
primitive is an int containing the value '0'
- * or '1'.
+ * BitVector implements a fixed width (1 bit) vector of
+ * boolean values which could be null. Each value in the vector corresponds
+ * to a single bit in the underlying data stream backing the vector.
  */
-public final class BitVector extends BaseDataValueVector implements 
FixedWidthVector {
-  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(BitVector.class);
-
-  private final Accessor accessor = new Accessor();
-  private final Mutator mutator = new Mutator();
-
-  int valueCount;
-  private int allocationSizeInBytes = 
getSizeFromCount(INITIAL_VALUE_ALLOCATION);
-  private int allocationMonitor = 0;
+public class BitVector extends BaseFixedWidthVector {
+  private final FieldReader reader;
 
+  /**
+   * Instantiate a BitVector. This doesn't allocate any memory for
+   * the data in vector.
+   *
+   * @param name  name of the vector
+   * @param allocator allocator for memory management.
+   */
   public BitVector(String name, BufferAllocator allocator) {
-super(name, allocator);
+this(name, FieldType.nullable(Types.MinorType.BIT.getType()),
+allocator);
   }
 
-  @Override
-  public void load(ArrowFieldNode fieldNode, ArrowBuf data) {
-// When the vector is all nulls or all defined, the content of the buffer 
can be omitted
-if (data.readableBytes() == 0 && fieldNode.getLength() != 0) {
-  int count = fieldNode.getLength();
-  allocateNew(count);
-  int n = getSizeFromCount(count);
-  if (fieldNode.getNullCount() == 0) {
-// all defined
-// create an all 1s buffer
-// set full bytes
-int fullBytesCount = count / 8;
-for (int i = 0; i < fullBytesCount; ++i) {
-  this.data.setByte(i, 0xFF);
-}
-int remainder = count % 8;
-// set remaining bits
-if (remainder > 0) {
-  byte bitMask = (byte) (0xFFL >>> ((8 - remainder) & 7));
-  this.data.setByte(fullBytesCount, bitMask);
-}
-  } else if (fieldNode.getNullCount() == fieldNode.getLength()) {
-// all null
-// create an all 0s buffer
-zeroVector();
-  } else {
-throw new IllegalArgumentException("The buffer can be empty only if 
there's no data or it's all null or all defined");
-  }
-  this.data.writerIndex(n);
-} else {
-  super.load(fieldNode, data);
-}
-this.valueCount = fieldNode.getLength();
+  /**
+   * Instantiate a BitVector. This doesn't allocate any memory for
+   * the data in vector.
+   *
+   * @param name  name of the vector
+   * @param fieldType type of Field materialized by this vector
+   * @param allocator allocator for memory management.
+   */
+  public BitVector(String name, FieldType fieldType, BufferAllocator 
allocator) {
+super(name, allocator, fieldType, (byte) 0);
+reader = new BitReaderImpl(BitVector.this);
   }
 
+  /**
+   * Get a reader that supports reading values from this vector
+   *
+   * @return Field Reader for this vector
+   */
   @Override
-  public Field getField() {
-throw new UnsupportedOperationException("internal vector");
+  public FieldReader getReader() {
+return reader;
   }
 
+  /**
+   * Get minor type for this vector. The vector holds values belonging
+   * to a particular type.
+   *

[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262971#comment-16262971
 ] 

ASF GitHub Bot commented on ARROW-1710:
---

BryanCutler commented on issue #1341: [WIP] ARROW-1710: [Java] Remove 
Non-Nullable Vectors
URL: https://github.com/apache/arrow/pull/1341#issuecomment-346422375
 
 
   > Do we want to port the missing function (getNullCount() and setRangeToOne) 
to the new bit vector?
   The new vector has `getNullCount()` just that it only counts nulls, maybe 
the equivalent would be `getZeroCount()` but I'm not sure how useful that 
really is.  I think I saw there was also a `setRangeToOne` and I'm ok with 
adding those.
   
   > Can we remove NonNullableMapVector altogether? (git rid of the MapVector 
extends NonNullableMapVector inherence and roll them into a single class )
   
   I was thinking this is what we were going to do, maybe as a followup though, 
so +1 for me.  One thing I'm not sure of is ullability useful for this vector?  
For example, Spark StructType doesn't have a nullable param, it's up to the 
child type definitions.  But maybe it's different in the Arrow sense, I'll have 
to think about that..
   
   Happy Thanksgiving to you all too!  You guys work too hard, I don't want to 
see any PRs coming through for a couple days :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Decide what to do with non-nullable vectors in new vector class 
> hierarchy 
> -
>
> Key: ARROW-1710
> URL: https://issues.apache.org/jira/browse/ARROW-1710
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java - Vectors
>Reporter: Li Jin
>Assignee: Bryan Cutler
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> So far the consensus seems to be remove all non-nullable vectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1844) [C++] Basic benchmark suite for hash kernels

2017-11-22 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1844:
---

 Summary: [C++] Basic benchmark suite for hash kernels
 Key: ARROW-1844
 URL: https://issues.apache.org/jira/browse/ARROW-1844
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Wes McKinney
 Fix For: 0.8.0


* Integers, small cardinality and large cardinality
* Short strings, small/large cardinality
* Long strings, small/large cardinality

These benchmarks will enable us to refactor without fear, and to experiment 
with faster hash functions



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1758) [Python] Remove pickle=True option for object serialization

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263005#comment-16263005
 ] 

ASF GitHub Bot commented on ARROW-1758:
---

wesm commented on a change in pull request #1347: ARROW-1758: [Python] Remove 
pickle=True option for object serialization
URL: https://github.com/apache/arrow/pull/1347#discussion_r152638013
 
 

 ##
 File path: python/pyarrow/serialization.py
 ##
 @@ -67,9 +68,11 @@ def _deserialize_default_dict(data):
 
 serialization_context.register_type(
 type(lambda: 0), "function",
-pickle=True)
+custom_serializer=pickle.dumps, custom_deserializer=pickle.loads)
 
-serialization_context.register_type(type, "type", pickle=True)
+serialization_context.register_type(type, "type",
+custom_serializer=pickle.dumps,
+custom_deserializer=pickle.loads)
 
 Review comment:
   If no custom serializer/deserializer is passed, should this default to 
`pickle`? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Remove pickle=True option for object serialization
> ---
>
> Key: ARROW-1758
> URL: https://issues.apache.org/jira/browse/ARROW-1758
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Philipp Moritz
>Assignee: Licht Takeuchi
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> As pointed out in 
> https://github.com/apache/arrow/pull/1272#issuecomment-340738439, we don't 
> really need this option, it can already be done with pickle.dumps as the 
> custom serializer and pickle.loads as the deserializer.
> This has the additional benefit that it will be very clear to the user which 
> pickler will be used and the user can use a custom pickler easily.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1845) [Python] Expose Decimal128Type

2017-11-22 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-1845:
--

 Summary: [Python] Expose Decimal128Type
 Key: ARROW-1845
 URL: https://issues.apache.org/jira/browse/ARROW-1845
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Uwe L. Korn
 Fix For: 0.8.0


In the course of the renaming, we forgot to update the Python code to the new 
{{Decimal128Type}}, thus master is currently failing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (ARROW-1845) [Python] Expose Decimal128Type

2017-11-22 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-1845:
--

Assignee: Uwe L. Korn

> [Python] Expose Decimal128Type
> --
>
> Key: ARROW-1845
> URL: https://issues.apache.org/jira/browse/ARROW-1845
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
> Fix For: 0.8.0
>
>
> In the course of the renaming, we forgot to update the Python code to the new 
> {{Decimal128Type}}, thus master is currently failing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263020#comment-16263020
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

wesm commented on issue #1259: ARROW-1047: [Java] Add Generic Reader Interface 
for Stream Format
URL: https://github.com/apache/arrow/pull/1259#issuecomment-346429291
 
 
   Squashed and rebased so we can get a passing build. While we are waiting, do 
we also want the `vector.ipc.message` subnamespace? Do not have a strong feeling


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1758) [Python] Remove pickle=True option for object serialization

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263024#comment-16263024
 ] 

ASF GitHub Bot commented on ARROW-1758:
---

pcmoritz commented on a change in pull request #1347: ARROW-1758: [Python] 
Remove pickle=True option for object serialization
URL: https://github.com/apache/arrow/pull/1347#discussion_r152641070
 
 

 ##
 File path: python/pyarrow/serialization.py
 ##
 @@ -67,9 +68,11 @@ def _deserialize_default_dict(data):
 
 serialization_context.register_type(
 type(lambda: 0), "function",
-pickle=True)
+custom_serializer=pickle.dumps, custom_deserializer=pickle.loads)
 
-serialization_context.register_type(type, "type", pickle=True)
+serialization_context.register_type(type, "type",
+custom_serializer=pickle.dumps,
+custom_deserializer=pickle.loads)
 
 Review comment:
   Currently it defaults to using __dict__, which is more efficient so it seems 
a good default, we use that in Ray for pretty much all types (however rarely I 
have seen cases where it doesn't work). I'd prefer to keep __dict__ the default 
but no big deals since it can easily be changed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Remove pickle=True option for object serialization
> ---
>
> Key: ARROW-1758
> URL: https://issues.apache.org/jira/browse/ARROW-1758
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Philipp Moritz
>Assignee: Licht Takeuchi
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> As pointed out in 
> https://github.com/apache/arrow/pull/1272#issuecomment-340738439, we don't 
> really need this option, it can already be done with pickle.dumps as the 
> custom serializer and pickle.loads as the deserializer.
> This has the additional benefit that it will be very clear to the user which 
> pickler will be used and the user can use a custom pickler easily.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1846) [C++] Implement "any" reduction kernel for boolean data, with the ability to short circuit when applying on chunked data

2017-11-22 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1846:
---

 Summary: [C++] Implement "any" reduction kernel for boolean data, 
with the ability to short circuit when applying on chunked data
 Key: ARROW-1846
 URL: https://issues.apache.org/jira/browse/ARROW-1846
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Wes McKinney






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263034#comment-16263034
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

icexelloss commented on issue #1259: ARROW-1047: [Java] Add Generic Reader 
Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#issuecomment-346431446
 
 
   I do not have a strong feeling either, I think `vector.ipc.message` 
subnamespace are fine. Although maybe we can move `ArrowMagic` to `message` 
subnamespace? Sorry for the oversight. @BryanCutler what do you think


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263033#comment-16263033
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

icexelloss commented on issue #1259: ARROW-1047: [Java] Add Generic Reader 
Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#issuecomment-346431446
 
 
   I do not have a strong feeling either, I think `vector.ipc.message` 
subnamespace are fine. Although maybe we can move `ArrowMagic` to `message` 
subnamespace? @BryanCutler what do you think


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1816) [Java] Resolve new vector classes structure for timestamp, date and maybe interval

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263038#comment-16263038
 ] 

ASF GitHub Bot commented on ARROW-1816:
---

icexelloss commented on a change in pull request #1330: ARROW-1816: [Java] 
Resolve new vector classes structure for timestamp, date and maybe interval  
URL: https://github.com/apache/arrow/pull/1330#discussion_r152642480
 
 

 ##
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/types/TimeUnit.java
 ##
 @@ -19,10 +19,10 @@
 package org.apache.arrow.vector.types;
 
 public enum TimeUnit {
-  SECOND(org.apache.arrow.flatbuf.TimeUnit.SECOND),
-  MILLISECOND(org.apache.arrow.flatbuf.TimeUnit.MILLISECOND),
-  MICROSECOND(org.apache.arrow.flatbuf.TimeUnit.MICROSECOND),
-  NANOSECOND(org.apache.arrow.flatbuf.TimeUnit.NANOSECOND);
+  SECOND(org.apache.arrow.flatbuf.TimeUnit.SECOND, 
java.util.concurrent.TimeUnit.SECONDS),
+  MILLISECOND(org.apache.arrow.flatbuf.TimeUnit.MILLISECOND, 
java.util.concurrent.TimeUnit.MILLISECONDS),
+  MICROSECOND(org.apache.arrow.flatbuf.TimeUnit.MICROSECOND, 
java.util.concurrent.TimeUnit.MICROSECONDS),
+  NANOSECOND(org.apache.arrow.flatbuf.TimeUnit.NANOSECOND, 
java.util.concurrent.TimeUnit.NANOSECONDS);
 
 Review comment:
   Enum equality should just be identity check:
   https://stackoverflow.com/questions/533922/is-it-ok-to-use-on-enums-in-java


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Resolve new vector classes structure for timestamp, date and maybe 
> interval
> --
>
> Key: ARROW-1816
> URL: https://issues.apache.org/jira/browse/ARROW-1816
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Li Jin
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Personally I think having 8 vector classes for timestamps is not great. This 
> is discussed at some point during the PR:
> https://github.com/apache/arrow/pull/1203#discussion_r145241388



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (ARROW-1844) [C++] Basic benchmark suite for hash kernels

2017-11-22 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-1844:
---

Assignee: Wes McKinney

> [C++] Basic benchmark suite for hash kernels
> 
>
> Key: ARROW-1844
> URL: https://issues.apache.org/jira/browse/ARROW-1844
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
> Fix For: 0.8.0
>
>
> * Integers, small cardinality and large cardinality
> * Short strings, small/large cardinality
> * Long strings, small/large cardinality
> These benchmarks will enable us to refactor without fear, and to experiment 
> with faster hash functions



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (ARROW-1813) Enforce checkstyle failure in JAVA build and fix all checkstyle

2017-11-22 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-1813.
---
Resolution: Duplicate

Duplicate of ARROW-1688

> Enforce  checkstyle failure in JAVA build and fix all checkstyle
> 
>
> Key: ARROW-1813
> URL: https://issues.apache.org/jira/browse/ARROW-1813
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Siddharth Teotia
>Assignee: Siddharth Teotia
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (ARROW-1688) [Java] Fail build on checkstyle warnings

2017-11-22 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-1688:
---

Assignee: Siddharth Teotia

> [Java] Fail build on checkstyle warnings
> 
>
> Key: ARROW-1688
> URL: https://issues.apache.org/jira/browse/ARROW-1688
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Siddharth Teotia
> Fix For: 0.8.0
>
>
> see discussion in ARROW-1474



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263122#comment-16263122
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

wesm commented on issue #1259: ARROW-1047: [Java] Add Generic Reader Interface 
for Stream Format
URL: https://github.com/apache/arrow/pull/1259#issuecomment-346440612
 
 
   Reviewing the past comments, since these classes are generally internal, I 
think it's fine. master is broken right now (ARROW-1845) so I will merge this


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263125#comment-16263125
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

wesm closed pull request #1259: ARROW-1047: [Java] Add Generic Reader Interface 
for Stream Format
URL: https://github.com/apache/arrow/pull/1259
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/java/tools/src/main/java/org/apache/arrow/tools/EchoServer.java 
b/java/tools/src/main/java/org/apache/arrow/tools/EchoServer.java
index 3091bc4da..ce6b5164a 100644
--- a/java/tools/src/main/java/org/apache/arrow/tools/EchoServer.java
+++ b/java/tools/src/main/java/org/apache/arrow/tools/EchoServer.java
@@ -23,8 +23,8 @@
 import org.apache.arrow.memory.BufferAllocator;
 import org.apache.arrow.memory.RootAllocator;
 import org.apache.arrow.vector.VectorSchemaRoot;
-import org.apache.arrow.vector.stream.ArrowStreamReader;
-import org.apache.arrow.vector.stream.ArrowStreamWriter;
+import org.apache.arrow.vector.ipc.ArrowStreamReader;
+import org.apache.arrow.vector.ipc.ArrowStreamWriter;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
diff --git a/java/tools/src/main/java/org/apache/arrow/tools/FileRoundtrip.java 
b/java/tools/src/main/java/org/apache/arrow/tools/FileRoundtrip.java
index ab8fa6e45..6e45305bf 100644
--- a/java/tools/src/main/java/org/apache/arrow/tools/FileRoundtrip.java
+++ b/java/tools/src/main/java/org/apache/arrow/tools/FileRoundtrip.java
@@ -22,8 +22,8 @@
 import org.apache.arrow.memory.BufferAllocator;
 import org.apache.arrow.memory.RootAllocator;
 import org.apache.arrow.vector.VectorSchemaRoot;
-import org.apache.arrow.vector.file.ArrowFileReader;
-import org.apache.arrow.vector.file.ArrowFileWriter;
+import org.apache.arrow.vector.ipc.ArrowFileReader;
+import org.apache.arrow.vector.ipc.ArrowFileWriter;
 import org.apache.arrow.vector.types.pojo.Schema;
 import org.apache.commons.cli.CommandLine;
 import org.apache.commons.cli.CommandLineParser;
diff --git a/java/tools/src/main/java/org/apache/arrow/tools/FileToStream.java 
b/java/tools/src/main/java/org/apache/arrow/tools/FileToStream.java
index 6722b30fa..3db01f40c 100644
--- a/java/tools/src/main/java/org/apache/arrow/tools/FileToStream.java
+++ b/java/tools/src/main/java/org/apache/arrow/tools/FileToStream.java
@@ -21,8 +21,8 @@
 import org.apache.arrow.memory.BufferAllocator;
 import org.apache.arrow.memory.RootAllocator;
 import org.apache.arrow.vector.VectorSchemaRoot;
-import org.apache.arrow.vector.file.ArrowFileReader;
-import org.apache.arrow.vector.stream.ArrowStreamWriter;
+import org.apache.arrow.vector.ipc.ArrowFileReader;
+import org.apache.arrow.vector.ipc.ArrowStreamWriter;
 
 import java.io.File;
 import java.io.FileInputStream;
diff --git a/java/tools/src/main/java/org/apache/arrow/tools/Integration.java 
b/java/tools/src/main/java/org/apache/arrow/tools/Integration.java
index d2b35e65a..666f1ddea 100644
--- a/java/tools/src/main/java/org/apache/arrow/tools/Integration.java
+++ b/java/tools/src/main/java/org/apache/arrow/tools/Integration.java
@@ -22,11 +22,11 @@
 import org.apache.arrow.memory.BufferAllocator;
 import org.apache.arrow.memory.RootAllocator;
 import org.apache.arrow.vector.VectorSchemaRoot;
-import org.apache.arrow.vector.file.ArrowBlock;
-import org.apache.arrow.vector.file.ArrowFileReader;
-import org.apache.arrow.vector.file.ArrowFileWriter;
-import org.apache.arrow.vector.file.json.JsonFileReader;
-import org.apache.arrow.vector.file.json.JsonFileWriter;
+import org.apache.arrow.vector.ipc.message.ArrowBlock;
+import org.apache.arrow.vector.ipc.ArrowFileReader;
+import org.apache.arrow.vector.ipc.ArrowFileWriter;
+import org.apache.arrow.vector.ipc.JsonFileReader;
+import org.apache.arrow.vector.ipc.JsonFileWriter;
 import org.apache.arrow.vector.types.pojo.DictionaryEncoding;
 import org.apache.arrow.vector.types.pojo.Field;
 import org.apache.arrow.vector.types.pojo.Schema;
diff --git a/java/tools/src/main/java/org/apache/arrow/tools/StreamToFile.java 
b/java/tools/src/main/java/org/apache/arrow/tools/StreamToFile.java
index ef1a11f6b..42d336af9 100644
--- a/java/tools/src/main/java/org/apache/arrow/tools/StreamToFile.java
+++ b/java/tools/src/main/java/org/apache/arrow/tools/StreamToFile.java
@@ -21,8 +21,8 @@
 import org.apache.arrow.memory.BufferAllocator;
 import org.apache.arrow.memory.RootAllocator;
 import org.apache.arrow.vector.VectorSchemaRoot;
-import org.apache.arrow.vector.file.ArrowFileWriter;
-import org.apache.arrow.vector.stream.ArrowStreamReader;
+import org.apache.arrow.vector.ipc.ArrowFileWriter;
+import org.apache.arrow.vector.ipc.ArrowStreamReader;
 
 import java.io.File;
 impor

[jira] [Updated] (ARROW-1845) [Python] Expose Decimal128Type

2017-11-22 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-1845:
--
Labels: pull-request-available  (was: )

> [Python] Expose Decimal128Type
> --
>
> Key: ARROW-1845
> URL: https://issues.apache.org/jira/browse/ARROW-1845
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> In the course of the renaming, we forgot to update the Python code to the new 
> {{Decimal128Type}}, thus master is currently failing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-22 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-1047.
-
   Resolution: Fixed
Fix Version/s: 0.8.0

Issue resolved by pull request 1259
[https://github.com/apache/arrow/pull/1259]

> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1845) [Python] Expose Decimal128Type

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263129#comment-16263129
 ] 

ASF GitHub Bot commented on ARROW-1845:
---

xhochy opened a new pull request #1348: ARROW-1845: [Python] Expose 
Decimal128Type
URL: https://github.com/apache/arrow/pull/1348
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Expose Decimal128Type
> --
>
> Key: ARROW-1845
> URL: https://issues.apache.org/jira/browse/ARROW-1845
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> In the course of the renaming, we forgot to update the Python code to the new 
> {{Decimal128Type}}, thus master is currently failing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263161#comment-16263161
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

BryanCutler commented on issue #1259: ARROW-1047: [Java] Add Generic Reader 
Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#issuecomment-346444019
 
 
   I'd like to keep the `vector.ipc.message` package, I think these generally 
define messages that serialize to FB.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263162#comment-16263162
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

BryanCutler commented on issue #1259: ARROW-1047: [Java] Add Generic Reader 
Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#issuecomment-346444248
 
 
   Thanks @wesm @icexelloss @elahrvivaz and @siddharthteotia !


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1845) [Python] Expose Decimal128Type

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263181#comment-16263181
 ] 

ASF GitHub Bot commented on ARROW-1845:
---

wesm commented on a change in pull request #1348: ARROW-1845: [Python] Expose 
Decimal128Type
URL: https://github.com/apache/arrow/pull/1348#discussion_r152656364
 
 

 ##
 File path: python/pyarrow/types.pxi
 ##
 @@ -967,7 +983,25 @@ cpdef DataType decimal(int precision, int scale=0):
 decimal_type : DecimalType
 """
 cdef shared_ptr[CDataType] decimal_type
-decimal_type.reset(new CDecimalType(precision, scale))
+decimal_type.reset(new CDecimalType(byte_width, precision, scale))
+return pyarrow_wrap_data_type(decimal_type)
 
 Review comment:
   `arrow::DecimalType` is the base type now -- I don't think we should allow 
users to create instances of it


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Expose Decimal128Type
> --
>
> Key: ARROW-1845
> URL: https://issues.apache.org/jira/browse/ARROW-1845
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> In the course of the renaming, we forgot to update the Python code to the new 
> {{Decimal128Type}}, thus master is currently failing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1845) [Python] Expose Decimal128Type

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263185#comment-16263185
 ] 

ASF GitHub Bot commented on ARROW-1845:
---

xhochy commented on a change in pull request #1348: ARROW-1845: [Python] Expose 
Decimal128Type
URL: https://github.com/apache/arrow/pull/1348#discussion_r152656705
 
 

 ##
 File path: python/pyarrow/types.pxi
 ##
 @@ -967,7 +983,25 @@ cpdef DataType decimal(int precision, int scale=0):
 decimal_type : DecimalType
 """
 cdef shared_ptr[CDataType] decimal_type
-decimal_type.reset(new CDecimalType(precision, scale))
+decimal_type.reset(new CDecimalType(byte_width, precision, scale))
+return pyarrow_wrap_data_type(decimal_type)
 
 Review comment:
   Should I simply remove this function then?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Expose Decimal128Type
> --
>
> Key: ARROW-1845
> URL: https://issues.apache.org/jira/browse/ARROW-1845
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> In the course of the renaming, we forgot to update the Python code to the new 
> {{Decimal128Type}}, thus master is currently failing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1845) [Python] Expose Decimal128Type

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263190#comment-16263190
 ] 

ASF GitHub Bot commented on ARROW-1845:
---

wesm commented on a change in pull request #1348: ARROW-1845: [Python] Expose 
Decimal128Type
URL: https://github.com/apache/arrow/pull/1348#discussion_r152657140
 
 

 ##
 File path: python/pyarrow/types.pxi
 ##
 @@ -967,7 +983,25 @@ cpdef DataType decimal(int precision, int scale=0):
 decimal_type : DecimalType
 """
 cdef shared_ptr[CDataType] decimal_type
-decimal_type.reset(new CDecimalType(precision, scale))
+decimal_type.reset(new CDecimalType(byte_width, precision, scale))
+return pyarrow_wrap_data_type(decimal_type)
 
 Review comment:
   Actually, that's OK to leave


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Expose Decimal128Type
> --
>
> Key: ARROW-1845
> URL: https://issues.apache.org/jira/browse/ARROW-1845
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> In the course of the renaming, we forgot to update the Python code to the new 
> {{Decimal128Type}}, thus master is currently failing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1847) [Doc] Document the difference between RecordBatch and Table in an FAQ fashion

2017-11-22 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-1847:
--

 Summary: [Doc] Document the difference between RecordBatch and 
Table in an FAQ fashion
 Key: ARROW-1847
 URL: https://issues.apache.org/jira/browse/ARROW-1847
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Documentation, Python
Reporter: Uwe L. Korn






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1845) [Python] Expose Decimal128Type

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263189#comment-16263189
 ] 

ASF GitHub Bot commented on ARROW-1845:
---

wesm commented on a change in pull request #1348: ARROW-1845: [Python] Expose 
Decimal128Type
URL: https://github.com/apache/arrow/pull/1348#discussion_r152657056
 
 

 ##
 File path: python/pyarrow/types.pxi
 ##
 @@ -967,7 +983,25 @@ cpdef DataType decimal(int precision, int scale=0):
 decimal_type : DecimalType
 """
 cdef shared_ptr[CDataType] decimal_type
-decimal_type.reset(new CDecimalType(precision, scale))
+decimal_type.reset(new CDecimalType(byte_width, precision, scale))
+return pyarrow_wrap_data_type(decimal_type)
 
 Review comment:
   Yes, and the `CDecimalType` class in libarrow.pxd


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Expose Decimal128Type
> --
>
> Key: ARROW-1845
> URL: https://issues.apache.org/jira/browse/ARROW-1845
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> In the course of the renaming, we forgot to update the Python code to the new 
> {{Decimal128Type}}, thus master is currently failing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263229#comment-16263229
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

icexelloss opened a new pull request #1349: ARROW-1047: [Java] [FollowUp] Add 
Generic Reader Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1349
 
 
   Move ArrowMagic from vector.ipc to vector.ipc.message package.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263230#comment-16263230
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

icexelloss commented on issue #1349: ARROW-1047: [Java] [FollowUp] Add Generic 
Reader Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1349#issuecomment-346454899
 
 
   cc @BryanCutler 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (ARROW-1816) [Java] Resolve new vector classes structure for timestamp, date and maybe interval

2017-11-22 Thread Li Jin (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin reassigned ARROW-1816:
-

Assignee: Li Jin

> [Java] Resolve new vector classes structure for timestamp, date and maybe 
> interval
> --
>
> Key: ARROW-1816
> URL: https://issues.apache.org/jira/browse/ARROW-1816
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Li Jin
>Assignee: Li Jin
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Personally I think having 8 vector classes for timestamps is not great. This 
> is discussed at some point during the PR:
> https://github.com/apache/arrow/pull/1203#discussion_r145241388



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1710) [Java] Remove non-nullable vectors in new vector class hierarchy

2017-11-22 Thread Li Jin (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-1710:
--
Summary: [Java] Remove non-nullable vectors in new vector class hierarchy   
(was: [Java] Decide what to do with non-nullable vectors in new vector class 
hierarchy )

> [Java] Remove non-nullable vectors in new vector class hierarchy 
> -
>
> Key: ARROW-1710
> URL: https://issues.apache.org/jira/browse/ARROW-1710
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java - Vectors
>Reporter: Li Jin
>Assignee: Bryan Cutler
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> So far the consensus seems to be remove all non-nullable vectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1828) [C++] Implement hash kernel specialization for BooleanType

2017-11-22 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-1828:
--
Labels: pull-request-available  (was: )

> [C++] Implement hash kernel specialization for BooleanType
> --
>
> Key: ARROW-1828
> URL: https://issues.apache.org/jira/browse/ARROW-1828
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Follow up to ARROW-1559



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1828) [C++] Implement hash kernel specialization for BooleanType

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263274#comment-16263274
 ] 

ASF GitHub Bot commented on ARROW-1828:
---

wesm opened a new pull request #1350: ARROW-1828: [C++] Hash kernel 
specialization for BooleanType
URL: https://github.com/apache/arrow/pull/1350
 
 
   This is a bit tedious because we want to preserve the order in which the 
unique values were observed. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Implement hash kernel specialization for BooleanType
> --
>
> Key: ARROW-1828
> URL: https://issues.apache.org/jira/browse/ARROW-1828
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Follow up to ARROW-1559



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1577) [JS] Package release script for NPM modules

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263281#comment-16263281
 ] 

ASF GitHub Bot commented on ARROW-1577:
---

trxcllnt commented on issue #1346: ARROW-1577: [JS] add ASF release scripts
URL: https://github.com/apache/arrow/pull/1346#issuecomment-346462332
 
 
   @wesm ah right, I forgot you need the dependencies installed. I've updated 
the release script run `npm install` before doing things.
   
   You shouldn't need to hand edit the version in package.json, because the 
[`npm version`](https://docs.npmjs.com/cli/version) command will do that for 
you 
[here](https://github.com/apache/arrow/pull/1346/commits/567c570e3568a97e90349e6911fa342b6c08274b#diff-2ca055280723a1db988255852328f368R54).
 If the version is already set in package.json to the value you pass to 
`js-source-release.sh`, the command will fail with a "same-version" error.
   
   `npm version` typically also automatically creates git tags. I had this 
disabled, but found a config variable that lets us specify the [tag version 
prefix](https://docs.npmjs.com/misc/config#tag-version-prefix)
   
   So now [this 
command](https://github.com/apache/arrow/pull/1346/commits/6692e1797a355e7351be16cced42511ff2bb006c#diff-2ca055280723a1db988255852328f368R44)
 will:
   1. increment the package.json version to the `$js_version` you pass to 
`js-source-release.sh`
   2. `git commit -m "[Release] apache-arrow-js-X.Y.Z"`
   3. `git tag -a apache-arrow-js-X.Y.Z` ([specified 
here]((https://github.com/apache/arrow/pull/1346/commits/6692e1797a355e7351be16cced42511ff2bb006c#diff-3564d26fe29cca7ef63a785b2ed23ed0R3)))
   
   so now you shouldn't have to explicitly create a new git tag before running 
`js-source-release.sh`, as npm will do that automatically.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Package release script for NPM modules
> ---
>
> Key: ARROW-1577
> URL: https://issues.apache.org/jira/browse/ARROW-1577
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Affects Versions: 0.8.0
>Reporter: Wes McKinney
>Assignee: Paul Taylor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Since the NPM JavaScript module may wish to release more frequently than the 
> main Arrow "monorepo", we should create a script to produce signed NPM 
> artifacts to use for voting:
> * Update metadata for new version
> * Run unit tests
> * Create package tarballs with NPM
> * GPG sign and create md5 and sha512 checksum files
> * Upload to Apache dev SVN
> i.e. like 
> https://github.com/apache/arrow/blob/master/dev/release/02-source.sh, but 
> only for JavaScript.
> We will also want to write instructions for Arrow developers to verify the 
> tarballs to streamline the release votes



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1577) [JS] Package release script for NPM modules

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263283#comment-16263283
 ] 

ASF GitHub Bot commented on ARROW-1577:
---

trxcllnt commented on issue #1346: ARROW-1577: [JS] add ASF release scripts
URL: https://github.com/apache/arrow/pull/1346#issuecomment-346462332
 
 
   @wesm ah right, I forgot you need the dependencies installed. I've updated 
the release script run `npm install` before doing things.
   
   You shouldn't need to hand edit the version in package.json, because the 
[`npm version`](https://docs.npmjs.com/cli/version) command will do that for 
you 
[here](https://github.com/apache/arrow/pull/1346/commits/567c570e3568a97e90349e6911fa342b6c08274b#diff-2ca055280723a1db988255852328f368R54).
 If the version is already set in package.json to the value you pass to 
`js-source-release.sh`, the command will fail with a "same-version" error.
   
   `npm version` typically also automatically creates git tags. I had this 
disabled, but found a config variable that lets us specify the [tag version 
prefix](https://docs.npmjs.com/misc/config#tag-version-prefix)
   
   So now [this 
command](https://github.com/apache/arrow/pull/1346/commits/6692e1797a355e7351be16cced42511ff2bb006c#diff-2ca055280723a1db988255852328f368R44)
 will:
   1. increment the package.json version to the `$js_version` you pass to 
`js-source-release.sh`
   2. `git commit -m "[Release] apache-arrow-js-X.Y.Z"`
   3. `git tag -a apache-arrow-js-X.Y.Z` ([specified 
here](https://github.com/apache/arrow/pull/1346/commits/6692e1797a355e7351be16cced42511ff2bb006c#diff-3564d26fe29cca7ef63a785b2ed23ed0R3))
   
   so now you shouldn't have to explicitly create a new git tag before running 
`js-source-release.sh`, as npm will do that automatically.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Package release script for NPM modules
> ---
>
> Key: ARROW-1577
> URL: https://issues.apache.org/jira/browse/ARROW-1577
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Affects Versions: 0.8.0
>Reporter: Wes McKinney
>Assignee: Paul Taylor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Since the NPM JavaScript module may wish to release more frequently than the 
> main Arrow "monorepo", we should create a script to produce signed NPM 
> artifacts to use for voting:
> * Update metadata for new version
> * Run unit tests
> * Create package tarballs with NPM
> * GPG sign and create md5 and sha512 checksum files
> * Upload to Apache dev SVN
> i.e. like 
> https://github.com/apache/arrow/blob/master/dev/release/02-source.sh, but 
> only for JavaScript.
> We will also want to write instructions for Arrow developers to verify the 
> tarballs to streamline the release votes



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1836) [C++] Fix C4996 warning from arrow/util/variant.h on MSVC builds

2017-11-22 Thread Max Risuhin (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263303#comment-16263303
 ] 

Max Risuhin commented on ARROW-1836:


Thanks! Have raised [PR|https://github.com/mapbox/variant/pull/163] to remove 
it from origin repo. Going to publish PR to remove it from arrow utils as well.

> [C++] Fix C4996 warning from arrow/util/variant.h on MSVC builds
> 
>
> Key: ARROW-1836
> URL: https://issues.apache.org/jira/browse/ARROW-1836
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Max Risuhin
> Fix For: 0.8.0
>
>
> [~Max Risuhin] can you look into this? This is leaking into downstream users 
> of Arrow. see e.g. 
> https://github.com/apache/parquet-cpp/pull/403/commits/8e40b7d7d8f161a14dfed70cb6d528e82ffa21a9
>  and build failures 
> https://ci.appveyor.com/project/ApacheSoftwareFoundation/parquet-cpp/build/1.0.443



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1710) [Java] Remove non-nullable vectors in new vector class hierarchy

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263322#comment-16263322
 ] 

ASF GitHub Bot commented on ARROW-1710:
---

icexelloss commented on issue #1341: [WIP] ARROW-1710: [Java] Remove 
Non-Nullable Vectors
URL: https://github.com/apache/arrow/pull/1341#issuecomment-346469505
 
 
   > I was thinking this is what we were going to do, maybe as a followup 
though, so +1 for me. One thing I'm not sure of is ullability useful for this 
vector? For example, Spark StructType doesn't have a nullable param, it's up to 
the child type definitions. But maybe it's different in the Arrow sense, I'll 
have to think about that..
   I think in Spark, nullable ability is part of `StructField` not `DataType`, 
so a nullable struct column in Spark would be:
   ```
   StructField("struct", StructType(Seq(StructField("a", IntergerType) ...)), 
nullable=true)
   ```
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Remove non-nullable vectors in new vector class hierarchy 
> -
>
> Key: ARROW-1710
> URL: https://issues.apache.org/jira/browse/ARROW-1710
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java - Vectors
>Reporter: Li Jin
>Assignee: Bryan Cutler
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> So far the consensus seems to be remove all non-nullable vectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1710) [Java] Remove non-nullable vectors in new vector class hierarchy

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263323#comment-16263323
 ] 

ASF GitHub Bot commented on ARROW-1710:
---

icexelloss commented on issue #1341: [WIP] ARROW-1710: [Java] Remove 
Non-Nullable Vectors
URL: https://github.com/apache/arrow/pull/1341#issuecomment-346469505
 
 
   > I was thinking this is what we were going to do, maybe as a followup 
though, so +1 for me. One thing I'm not sure of is ullability useful for this 
vector? For example, Spark StructType doesn't have a nullable param, it's up to 
the child type definitions. But maybe it's different in the Arrow sense, I'll 
have to think about that..
   
   I think in Spark, nullable ability is part of `StructField` not `DataType`, 
so a nullable struct column in Spark would be:
   ```
   StructField("struct", StructType(Seq(StructField("a", IntergerType) ...)), 
nullable=true)
   ```
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Remove non-nullable vectors in new vector class hierarchy 
> -
>
> Key: ARROW-1710
> URL: https://issues.apache.org/jira/browse/ARROW-1710
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java - Vectors
>Reporter: Li Jin
>Assignee: Bryan Cutler
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> So far the consensus seems to be remove all non-nullable vectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1828) [C++] Implement hash kernel specialization for BooleanType

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263325#comment-16263325
 ] 

ASF GitHub Bot commented on ARROW-1828:
---

wesm commented on a change in pull request #1350: ARROW-1828: [C++] Hash kernel 
specialization for BooleanType
URL: https://github.com/apache/arrow/pull/1350#discussion_r152676160
 
 

 ##
 File path: cpp/src/arrow/compute/kernels/hash.cc
 ##
 @@ -368,6 +368,79 @@ class HashTableKernel> : public HashTab
   HashDictionary dict_;
 };
 
+// --
+// Hash table for boolean types
+
+template 
+class HashTableKernel> : public 
HashTable {
+ public:
+  HashTableKernel(const std::shared_ptr& type, MemoryPool* pool)
+  : HashTable(type, pool) {
+std::fill(table_, table_ + 2, kHashSlotEmpty);
+  }
+
+  Status Append(const ArrayData& arr) override {
+auto action = static_cast(this);
+
+RETURN_NOT_OK(action->Reserve(arr.length));
+
+internal::BitmapReader value_reader(arr.buffers[1]->data(), arr.offset, 
arr.length);
+
+#define HASH_INNER_LOOP()  \
+  if (slot == kHashSlotEmpty) {\
+if (!Action::allow_expand) {   \
+  throw HashException("Encountered new dictionary value"); \
+}  \
+table_[j] = slot = static_cast(dict_.size()); \
+dict_.push_back(value);\
+action->ObserveNotFound(slot); \
+  } else { \
+action->ObserveFound(slot);\
+  }
+
+if (arr.null_count != 0) {
+  internal::BitmapReader valid_reader(arr.buffers[0]->data(), arr.offset, 
arr.length);
+  for (int64_t i = 0; i < arr.length; ++i) {
+const bool is_null = valid_reader.IsNotSet();
+const bool value = value_reader.IsSet();
+const int j = value ? 1 : 0;
+hash_slot_t slot = table_[j];
+valid_reader.Next();
+value_reader.Next();
+if (is_null) {
+  action->ObserveNull();
+  continue;
+}
+HASH_INNER_LOOP();
+  }
+} else {
+  for (int64_t i = 0; i < arr.length; ++i) {
+const bool value = value_reader.IsSet();
+const int j = value ? 1 : 0;
+hash_slot_t slot = table_[j];
+value_reader.Next();
+HASH_INNER_LOOP();
+  }
+}
 
 Review comment:
   The macro strategy used elsewhere doesn't quite work here because the the 
bit reader for the data has to be advanced. We can address this in later 
refactoring...


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Implement hash kernel specialization for BooleanType
> --
>
> Key: ARROW-1828
> URL: https://issues.apache.org/jira/browse/ARROW-1828
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Follow up to ARROW-1559



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1577) [JS] Package release script for NPM modules

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263332#comment-16263332
 ] 

ASF GitHub Bot commented on ARROW-1577:
---

wesm commented on issue #1346: ARROW-1577: [JS] add ASF release scripts
URL: https://github.com/apache/arrow/pull/1346#issuecomment-346470928
 
 
   Cool, I will give it another go!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Package release script for NPM modules
> ---
>
> Key: ARROW-1577
> URL: https://issues.apache.org/jira/browse/ARROW-1577
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Affects Versions: 0.8.0
>Reporter: Wes McKinney
>Assignee: Paul Taylor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Since the NPM JavaScript module may wish to release more frequently than the 
> main Arrow "monorepo", we should create a script to produce signed NPM 
> artifacts to use for voting:
> * Update metadata for new version
> * Run unit tests
> * Create package tarballs with NPM
> * GPG sign and create md5 and sha512 checksum files
> * Upload to Apache dev SVN
> i.e. like 
> https://github.com/apache/arrow/blob/master/dev/release/02-source.sh, but 
> only for JavaScript.
> We will also want to write instructions for Arrow developers to verify the 
> tarballs to streamline the release votes



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1577) [JS] Package release script for NPM modules

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263338#comment-16263338
 ] 

ASF GitHub Bot commented on ARROW-1577:
---

wesm commented on issue #1346: ARROW-1577: [JS] add ASF release scripts
URL: https://github.com/apache/arrow/pull/1346#issuecomment-346471395
 
 
   ```
   $ dev/release/js-source-release.sh 0.2.0 0
   Preparing source for tag apache-arrow-js-0.2.0
   
   > uglifyjs-webpack-plugin@0.4.6 postinstall 
/home/wesm/code/arrow/js/node_modules/webpack/node_modules/uglifyjs-webpack-plugin
   > node lib/post_install.js
   
   npm WARN apache-arrow@0.1.2 requires a peer of command-line-usage@4.0.1 but 
none is installed. You must install peer dependencies yourself.
   npm WARN optional SKIPPING OPTIONAL DEPENDENCY: fsevents@1.1.3 
(node_modules/fsevents):
   npm WARN notsup SKIPPING OPTIONAL DEPENDENCY: Unsupported platform for 
fsevents@1.1.3: wanted {"os":"darwin","arch":"any"} (current: 
{"os":"linux","arch":"x64"})
   
   added 1269 packages in 22.642s
   v0.2.0
   
   > apache-arrow@0.2.0 version /home/wesm/code/arrow/js
   > npm install && npm run clean:all
   
   npm WARN apache-arrow@0.2.0 requires a peer of command-line-usage@4.0.1 but 
none is installed. You must install peer dependencies yourself.
   npm WARN optional SKIPPING OPTIONAL DEPENDENCY: fsevents@1.1.3 
(node_modules/fsevents):
   npm WARN notsup SKIPPING OPTIONAL DEPENDENCY: Unsupported platform for 
fsevents@1.1.3: wanted {"os":"darwin","arch":"any"} (current: 
{"os":"linux","arch":"x64"})
   
   added 116 packages in 4.871s
   
   > apache-arrow@0.2.0 clean:all /home/wesm/code/arrow/js
   > run-p clean clean:testdata
   
   
   > apache-arrow@0.2.0 clean /home/wesm/code/arrow/js
   > gulp clean
   
   
   > apache-arrow@0.2.0 clean:testdata /home/wesm/code/arrow/js
   > gulp clean:testdata
   
   [15:59:12] Using gulpfile ~/code/arrow/js/gulpfile.js
   [15:59:12] Using gulpfile ~/code/arrow/js/gulpfile.js
   [15:59:12] Starting 'clean:testdata'...
   [15:59:12] Starting 'clean'...
   [15:59:12] Starting 'clean:ts'...
   [15:59:12] Starting 'clean:apache-arrow'...
   [15:59:12] Starting 'clean:es5:cjs'...
   [15:59:12] Starting 'clean:es2015:cjs'...
   [15:59:12] Starting 'clean:esnext:cjs'...
   [15:59:12] Starting 'clean:es5:esm'...
   [15:59:12] Starting 'clean:es2015:esm'...
   [15:59:12] Starting 'clean:esnext:esm'...
   [15:59:12] Starting 'clean:es5:cls'...
   [15:59:12] Starting 'clean:es2015:cls'...
   [15:59:12] Starting 'clean:esnext:cls'...
   [15:59:12] Starting 'clean:es5:umd'...
   [15:59:12] Starting 'clean:es2015:umd'...
   [15:59:12] Starting 'clean:esnext:umd'...
   [15:59:12] Finished 'clean:testdata' after 5.41 ms
   [15:59:12] Finished 'clean:ts' after 10 ms
   [15:59:12] Finished 'clean:apache-arrow' after 11 ms
   [15:59:12] Finished 'clean:es5:cjs' after 11 ms
   [15:59:12] Finished 'clean:es2015:cjs' after 11 ms
   [15:59:12] Finished 'clean:esnext:cjs' after 11 ms
   [15:59:12] Finished 'clean:es5:esm' after 14 ms
   [15:59:12] Finished 'clean:es2015:esm' after 14 ms
   [15:59:12] Finished 'clean:esnext:esm' after 14 ms
   [15:59:12] Finished 'clean:es5:cls' after 14 ms
   [15:59:12] Finished 'clean:es2015:cls' after 14 ms
   [15:59:12] Finished 'clean:esnext:cls' after 14 ms
   [15:59:12] Finished 'clean:es5:umd' after 14 ms
   [15:59:12] Finished 'clean:es2015:umd' after 14 ms
   [15:59:12] Finished 'clean:esnext:umd' after 14 ms
   [15:59:12] Finished 'clean' after 17 ms
   Cannot continue: unknown git tag: apache-arrow-js-0.2.0
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Package release script for NPM modules
> ---
>
> Key: ARROW-1577
> URL: https://issues.apache.org/jira/browse/ARROW-1577
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Affects Versions: 0.8.0
>Reporter: Wes McKinney
>Assignee: Paul Taylor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Since the NPM JavaScript module may wish to release more frequently than the 
> main Arrow "monorepo", we should create a script to produce signed NPM 
> artifacts to use for voting:
> * Update metadata for new version
> * Run unit tests
> * Create package tarballs with NPM
> * GPG sign and create md5 and sha512 checksum files
> * Upload to Apache dev SVN
> i.e. like 
> https://github.com/apache/arrow/blob/master/dev/release/02-source.sh, but 
> only for JavaScript.
> We will also want to write instructions for Arrow developers to verify the 

[jira] [Created] (ARROW-1848) [Python] Add documentation examples for reading single Parquet files and datasets from HDFS

2017-11-22 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1848:
---

 Summary: [Python] Add documentation examples for reading single 
Parquet files and datasets from HDFS
 Key: ARROW-1848
 URL: https://issues.apache.org/jira/browse/ARROW-1848
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Wes McKinney


see 
https://stackoverflow.com/questions/47443151/read-a-parquet-files-from-hdfs-using-pyarrow



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (ARROW-1783) [Python] Convert SerializedPyObject to/from sequence of component buffers with minimal memory allocation / copying

2017-11-22 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-1783:
---

Assignee: Wes McKinney

> [Python] Convert SerializedPyObject to/from sequence of component buffers 
> with minimal memory allocation / copying
> --
>
> Key: ARROW-1783
> URL: https://issues.apache.org/jira/browse/ARROW-1783
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
> Fix For: 0.8.0
>
>
> See discussion on Dask org:
> https://github.com/dask/distributed/pull/931
> It would be valuable for downstream users to compute the serialized payload 
> as a sequence of memoryview-compatible objects without having to allocate new 
> memory on write. This means that the component tensor messages must have 
> their metadata and bodies in separate buffers. This will require a bit of 
> work internally reassemble the object from a collection of {{pyarrow.Buffer}} 
> objects
> see also ARROW-1509



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1849) [GLib] Add input checks to GArrowRecordBatch

2017-11-22 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-1849:
---

 Summary: [GLib] Add input checks to GArrowRecordBatch
 Key: ARROW-1849
 URL: https://issues.apache.org/jira/browse/ARROW-1849
 Project: Apache Arrow
  Issue Type: Improvement
  Components: GLib
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou
Priority: Minor
 Fix For: 0.8.0






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1849) [GLib] Add input checks to GArrowRecordBatch

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263427#comment-16263427
 ] 

ASF GitHub Bot commented on ARROW-1849:
---

kou opened a new pull request #1351: ARROW-1849: [GLib] Add input checks to 
GArrowRecordBatch
URL: https://github.com/apache/arrow/pull/1351
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [GLib] Add input checks to GArrowRecordBatch
> 
>
> Key: ARROW-1849
> URL: https://issues.apache.org/jira/browse/ARROW-1849
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: GLib
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1849) [GLib] Add input checks to GArrowRecordBatch

2017-11-22 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-1849:
--
Labels: pull-request-available  (was: )

> [GLib] Add input checks to GArrowRecordBatch
> 
>
> Key: ARROW-1849
> URL: https://issues.apache.org/jira/browse/ARROW-1849
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: GLib
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (ARROW-1845) [Python] Expose Decimal128Type

2017-11-22 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-1845.
-
Resolution: Fixed

Issue resolved by pull request 1348
[https://github.com/apache/arrow/pull/1348]

> [Python] Expose Decimal128Type
> --
>
> Key: ARROW-1845
> URL: https://issues.apache.org/jira/browse/ARROW-1845
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> In the course of the renaming, we forgot to update the Python code to the new 
> {{Decimal128Type}}, thus master is currently failing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1845) [Python] Expose Decimal128Type

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263455#comment-16263455
 ] 

ASF GitHub Bot commented on ARROW-1845:
---

wesm commented on issue #1348: ARROW-1845: [Python] Expose Decimal128Type
URL: https://github.com/apache/arrow/pull/1348#issuecomment-346488843
 
 
   Here's appveyor passing: 
https://ci.appveyor.com/project/xhochy/arrow/build/1.0.520


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Expose Decimal128Type
> --
>
> Key: ARROW-1845
> URL: https://issues.apache.org/jira/browse/ARROW-1845
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> In the course of the renaming, we forgot to update the Python code to the new 
> {{Decimal128Type}}, thus master is currently failing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1845) [Python] Expose Decimal128Type

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263457#comment-16263457
 ] 

ASF GitHub Bot commented on ARROW-1845:
---

wesm closed pull request #1348: ARROW-1845: [Python] Expose Decimal128Type
URL: https://github.com/apache/arrow/pull/1348
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/python/doc/source/api.rst b/python/doc/source/api.rst
index 8f2f23d9f..fb2a28677 100644
--- a/python/doc/source/api.rst
+++ b/python/doc/source/api.rst
@@ -50,7 +50,7 @@ Type and Schema Factory Functions
date64
binary
string
-   decimal
+   decimal128
list_
struct
dictionary
diff --git a/python/pyarrow/__init__.py b/python/pyarrow/__init__.py
index c8ded2d3c..c4db36e55 100644
--- a/python/pyarrow/__init__.py
+++ b/python/pyarrow/__init__.py
@@ -35,7 +35,7 @@
  uint8, uint16, uint32, uint64,
  time32, time64, timestamp, date32, date64,
  float16, float32, float64,
- binary, string, decimal,
+ binary, string, decimal128,
  list_, struct, union, dictionary, field,
  type_for_alias,
  DataType, NAType,
diff --git a/python/pyarrow/includes/libarrow.pxd 
b/python/pyarrow/includes/libarrow.pxd
index 73e34c7b2..f1f59384b 100644
--- a/python/pyarrow/includes/libarrow.pxd
+++ b/python/pyarrow/includes/libarrow.pxd
@@ -209,10 +209,10 @@ cdef extern from "arrow/api.h" namespace "arrow" nogil:
 int byte_width()
 int bit_width()
 
-cdef cppclass CDecimalType" arrow::DecimalType"(CFixedSizeBinaryType):
+cdef cppclass CDecimal128Type" 
arrow::Decimal128Type"(CFixedSizeBinaryType):
+CDecimal128Type(int precision, int scale)
 int precision()
 int scale()
-CDecimalType(int precision, int scale)
 
 cdef cppclass CField" arrow::Field":
 const c_string& name()
diff --git a/python/pyarrow/lib.pxd b/python/pyarrow/lib.pxd
index 6413b838f..5abb72ba4 100644
--- a/python/pyarrow/lib.pxd
+++ b/python/pyarrow/lib.pxd
@@ -81,9 +81,9 @@ cdef class FixedSizeBinaryType(DataType):
 const CFixedSizeBinaryType* fixed_size_binary_type
 
 
-cdef class DecimalType(FixedSizeBinaryType):
+cdef class Decimal128Type(FixedSizeBinaryType):
 cdef:
-const CDecimalType* decimal_type
+const CDecimal128Type* decimal128_type
 
 
 cdef class Field:
diff --git a/python/pyarrow/pandas_compat.py b/python/pyarrow/pandas_compat.py
index 0aab9a41b..a50ef96e7 100644
--- a/python/pyarrow/pandas_compat.py
+++ b/python/pyarrow/pandas_compat.py
@@ -80,7 +80,7 @@ def get_logical_type(arrow_type):
 return 'list[{}]'.format(get_logical_type(arrow_type.value_type))
 elif isinstance(arrow_type, pa.lib.TimestampType):
 return 'datetimetz' if arrow_type.tz is not None else 'datetime'
-elif isinstance(arrow_type, pa.lib.DecimalType):
+elif isinstance(arrow_type, pa.lib.Decimal128Type):
 return 'decimal'
 raise NotImplementedError(str(arrow_type))
 
diff --git a/python/pyarrow/public-api.pxi b/python/pyarrow/public-api.pxi
index 90aff9e93..bf670c5c4 100644
--- a/python/pyarrow/public-api.pxi
+++ b/python/pyarrow/public-api.pxi
@@ -78,7 +78,7 @@ cdef public api object pyarrow_wrap_data_type(
 elif type.get().id() == _Type_FIXED_SIZE_BINARY:
 out = FixedSizeBinaryType()
 elif type.get().id() == _Type_DECIMAL:
-out = DecimalType()
+out = Decimal128Type()
 else:
 out = DataType()
 
diff --git a/python/pyarrow/tests/test_array.py 
b/python/pyarrow/tests/test_array.py
index b7b0b1833..a4d781a33 100644
--- a/python/pyarrow/tests/test_array.py
+++ b/python/pyarrow/tests/test_array.py
@@ -470,7 +470,7 @@ def test_simple_type_construction():
 (pa.binary(length=4), 'bytes'),
 (pa.string(), 'unicode'),
 (pa.list_(pa.list_(pa.int16())), 'list[list[int16]]'),
-(pa.decimal(18, 3), 'decimal'),
+(pa.decimal128(18, 3), 'decimal'),
 (pa.timestamp('ms'), 'datetime'),
 (pa.timestamp('us', 'UTC'), 'datetimetz'),
 (pa.time32('s'), 'time'),
diff --git a/python/pyarrow/tests/test_convert_builtin.py 
b/python/pyarrow/tests/test_convert_builtin.py
index c7a0d49b4..4c3d9e563 100644
--- a/python/pyarrow/tests/test_convert_builtin.py
+++ b/python/pyarrow/tests/test_convert_builtin.py
@@ -314,7 +314,7 @@ def test_mixed_types_fails(self):
 
 def test_decimal(self):
 data = [decimal.Decimal('1234.183'), decimal.Decimal('8094.234')]
-type = pa.decimal(preci

[jira] [Commented] (ARROW-1710) [Java] Remove non-nullable vectors in new vector class hierarchy

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263473#comment-16263473
 ] 

ASF GitHub Bot commented on ARROW-1710:
---

wesm commented on issue #1341: [WIP] ARROW-1710: [Java] Remove Non-Nullable 
Vectors
URL: https://github.com/apache/arrow/pull/1341#issuecomment-346491035
 
 
   Needs rebase


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Remove non-nullable vectors in new vector class hierarchy 
> -
>
> Key: ARROW-1710
> URL: https://issues.apache.org/jira/browse/ARROW-1710
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java - Vectors
>Reporter: Li Jin
>Assignee: Bryan Cutler
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> So far the consensus seems to be remove all non-nullable vectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1577) [JS] Package release script for NPM modules

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263480#comment-16263480
 ] 

ASF GitHub Bot commented on ARROW-1577:
---

wesm commented on issue #1346: ARROW-1577: [JS] add ASF release scripts
URL: https://github.com/apache/arrow/pull/1346#issuecomment-346491793
 
 
   Rebased. +1, will merge once the builds run


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Package release script for NPM modules
> ---
>
> Key: ARROW-1577
> URL: https://issues.apache.org/jira/browse/ARROW-1577
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Affects Versions: 0.8.0
>Reporter: Wes McKinney
>Assignee: Paul Taylor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Since the NPM JavaScript module may wish to release more frequently than the 
> main Arrow "monorepo", we should create a script to produce signed NPM 
> artifacts to use for voting:
> * Update metadata for new version
> * Run unit tests
> * Create package tarballs with NPM
> * GPG sign and create md5 and sha512 checksum files
> * Upload to Apache dev SVN
> i.e. like 
> https://github.com/apache/arrow/blob/master/dev/release/02-source.sh, but 
> only for JavaScript.
> We will also want to write instructions for Arrow developers to verify the 
> tarballs to streamline the release votes



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1758) [Python] Remove pickle=True option for object serialization

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263681#comment-16263681
 ] 

ASF GitHub Bot commented on ARROW-1758:
---

wesm commented on issue #1347: ARROW-1758: [Python] Remove pickle=True option 
for object serialization
URL: https://github.com/apache/arrow/pull/1347#issuecomment-346517999
 
 
   The tests fail here, need to use cloudpickle, I think


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Remove pickle=True option for object serialization
> ---
>
> Key: ARROW-1758
> URL: https://issues.apache.org/jira/browse/ARROW-1758
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Philipp Moritz
>Assignee: Licht Takeuchi
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> As pointed out in 
> https://github.com/apache/arrow/pull/1272#issuecomment-340738439, we don't 
> really need this option, it can already be done with pickle.dumps as the 
> custom serializer and pickle.loads as the deserializer.
> This has the additional benefit that it will be very clear to the user which 
> pickler will be used and the user can use a custom pickler easily.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1758) [Python] Remove pickle=True option for object serialization

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263682#comment-16263682
 ] 

ASF GitHub Bot commented on ARROW-1758:
---

Licht-T commented on issue #1347: ARROW-1758: [Python] Remove pickle=True 
option for object serialization
URL: https://github.com/apache/arrow/pull/1347#issuecomment-346518202
 
 
   Okay, I'll check.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Remove pickle=True option for object serialization
> ---
>
> Key: ARROW-1758
> URL: https://issues.apache.org/jira/browse/ARROW-1758
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Philipp Moritz
>Assignee: Licht Takeuchi
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> As pointed out in 
> https://github.com/apache/arrow/pull/1272#issuecomment-340738439, we don't 
> really need this option, it can already be done with pickle.dumps as the 
> custom serializer and pickle.loads as the deserializer.
> This has the additional benefit that it will be very clear to the user which 
> pickler will be used and the user can use a custom pickler easily.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1758) [Python] Remove pickle=True option for object serialization

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263688#comment-16263688
 ] 

ASF GitHub Bot commented on ARROW-1758:
---

wesm commented on a change in pull request #1347: ARROW-1758: [Python] Remove 
pickle=True option for object serialization
URL: https://github.com/apache/arrow/pull/1347#discussion_r152714673
 
 

 ##
 File path: python/pyarrow/serialization.py
 ##
 @@ -67,9 +68,11 @@ def _deserialize_default_dict(data):
 
 serialization_context.register_type(
 type(lambda: 0), "function",
-pickle=True)
+custom_serializer=pickle.dumps, custom_deserializer=pickle.loads)
 
-serialization_context.register_type(type, "type", pickle=True)
+serialization_context.register_type(type, "type",
+custom_serializer=pickle.dumps,
+custom_deserializer=pickle.loads)
 
 Review comment:
   What about lambdas?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Remove pickle=True option for object serialization
> ---
>
> Key: ARROW-1758
> URL: https://issues.apache.org/jira/browse/ARROW-1758
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Philipp Moritz
>Assignee: Licht Takeuchi
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> As pointed out in 
> https://github.com/apache/arrow/pull/1272#issuecomment-340738439, we don't 
> really need this option, it can already be done with pickle.dumps as the 
> custom serializer and pickle.loads as the deserializer.
> This has the additional benefit that it will be very clear to the user which 
> pickler will be used and the user can use a custom pickler easily.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1758) [Python] Remove pickle=True option for object serialization

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263687#comment-16263687
 ] 

ASF GitHub Bot commented on ARROW-1758:
---

pcmoritz commented on a change in pull request #1347: ARROW-1758: [Python] 
Remove pickle=True option for object serialization
URL: https://github.com/apache/arrow/pull/1347#discussion_r152641070
 
 

 ##
 File path: python/pyarrow/serialization.py
 ##
 @@ -67,9 +68,11 @@ def _deserialize_default_dict(data):
 
 serialization_context.register_type(
 type(lambda: 0), "function",
-pickle=True)
+custom_serializer=pickle.dumps, custom_deserializer=pickle.loads)
 
-serialization_context.register_type(type, "type", pickle=True)
+serialization_context.register_type(type, "type",
+custom_serializer=pickle.dumps,
+custom_deserializer=pickle.loads)
 
 Review comment:
   Currently it defaults to using `__dict__`, which is more efficient so it 
seems a good default, we use that in Ray for pretty much all types (however 
rarely I have seen cases where it doesn't work). I'd prefer to keep `__dict__` 
the default but no big deals since it can easily be changed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Remove pickle=True option for object serialization
> ---
>
> Key: ARROW-1758
> URL: https://issues.apache.org/jira/browse/ARROW-1758
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Philipp Moritz
>Assignee: Licht Takeuchi
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> As pointed out in 
> https://github.com/apache/arrow/pull/1272#issuecomment-340738439, we don't 
> really need this option, it can already be done with pickle.dumps as the 
> custom serializer and pickle.loads as the deserializer.
> This has the additional benefit that it will be very clear to the user which 
> pickler will be used and the user can use a custom pickler easily.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1832) [JS] Implement JSON reader for integration tests

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263726#comment-16263726
 ] 

ASF GitHub Bot commented on ARROW-1832:
---

trxcllnt commented on issue #1343: [WIP] ARROW-1832: [JS] Implement JSON reader 
for integration tests
URL: https://github.com/apache/arrow/pull/1343#issuecomment-346523014
 
 
   @TheNeuralBit thinking about this more, I think with a few modifications we 
could reuse all the [vector 
reader](https://github.com/apache/arrow/blob/9b2dc77a4d95c7415edd5be087a5abafc5a7f64c/js/src/reader/vector.ts#L49)
 functions for the JSON reader.
   
   If we move [`createTypedArray` and 
`createValidityArray`](https://github.com/apache/arrow/blob/9b2dc77a4d95c7415edd5be087a5abafc5a7f64c/js/src/reader/vector.ts#L261)
 into the `VectorReaderContext` interface, the `offset` field can become a 
private impl detail of how the 
[`BufferReaderContext`](https://github.com/apache/arrow/blob/9b2dc77a4d95c7415edd5be087a5abafc5a7f64c/js/src/reader/arrow.ts#L167)
 creates TypedArrays. Then we can implement a `JSONReaderContext` that creates 
its TypedArrays from the json. Lastly, we can keep going with your idea for 
shimming the flatbuffers interfaces. With a bit of reflection I think we can 
make this pretty slim.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Implement JSON reader for integration tests
> 
>
> Key: ARROW-1832
> URL: https://issues.apache.org/jira/browse/ARROW-1832
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
>
> Implementing a JSON reader will allow us to write a "validate" script for the 
> consumer half of the integration tests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263809#comment-16263809
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

BryanCutler commented on issue #1349: ARROW-1047: [Java] [FollowUp] Move 
ArrowMagic to ipc.message package
URL: https://github.com/apache/arrow/pull/1349#issuecomment-346531235
 
 
   I was thinking of `ArrowMagic` more being part of the file protocol than a 
message, but it's fine with me if you prefer to move it there.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1710) [Java] Remove non-nullable vectors in new vector class hierarchy

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263812#comment-16263812
 ] 

ASF GitHub Bot commented on ARROW-1710:
---

BryanCutler commented on issue #1341: [WIP] ARROW-1710: [Java] Remove 
Non-Nullable Vectors
URL: https://github.com/apache/arrow/pull/1341#issuecomment-346531472
 
 
   Oh yeah, you're right about Spark struct, duh!  So should I change the 
naming of `MapVector` back to `NullableMapVector` <- `MapVector` then and we 
can discuss combing to 1 class in another JIRA?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Remove non-nullable vectors in new vector class hierarchy 
> -
>
> Key: ARROW-1710
> URL: https://issues.apache.org/jira/browse/ARROW-1710
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java - Vectors
>Reporter: Li Jin
>Assignee: Bryan Cutler
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> So far the consensus seems to be remove all non-nullable vectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1710) [Java] Remove non-nullable vectors in new vector class hierarchy

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263820#comment-16263820
 ] 

ASF GitHub Bot commented on ARROW-1710:
---

BryanCutler commented on a change in pull request #1341: [WIP] ARROW-1710: 
[Java] Remove Non-Nullable Vectors
URL: https://github.com/apache/arrow/pull/1341#discussion_r152724007
 
 

 ##
 File path: java/vector/src/main/java/org/apache/arrow/vector/BitVector.java
 ##
 @@ -18,342 +18,469 @@
 
 package org.apache.arrow.vector;
 
+import io.netty.buffer.ArrowBuf;
 import org.apache.arrow.memory.BufferAllocator;
-import org.apache.arrow.memory.BaseAllocator;
-import org.apache.arrow.memory.OutOfMemoryException;
+import org.apache.arrow.vector.complex.impl.BitReaderImpl;
 import org.apache.arrow.vector.complex.reader.FieldReader;
 import org.apache.arrow.vector.holders.BitHolder;
 import org.apache.arrow.vector.holders.NullableBitHolder;
-import org.apache.arrow.vector.schema.ArrowFieldNode;
-import org.apache.arrow.vector.types.Types.MinorType;
-import org.apache.arrow.vector.types.pojo.Field;
+import org.apache.arrow.vector.types.Types;
+import org.apache.arrow.vector.types.pojo.FieldType;
 import org.apache.arrow.vector.util.OversizedAllocationException;
 import org.apache.arrow.vector.util.TransferPair;
 
-import io.netty.buffer.ArrowBuf;
-
 /**
- * Bit implements a vector of bit-width values. Elements in the vector are 
accessed by position from the logical start
- * of the vector. The width of each element is 1 bit. The equivalent Java 
primitive is an int containing the value '0'
- * or '1'.
+ * BitVector implements a fixed width (1 bit) vector of
+ * boolean values which could be null. Each value in the vector corresponds
+ * to a single bit in the underlying data stream backing the vector.
  */
-public final class BitVector extends BaseDataValueVector implements 
FixedWidthVector {
-  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(BitVector.class);
-
-  private final Accessor accessor = new Accessor();
-  private final Mutator mutator = new Mutator();
-
-  int valueCount;
-  private int allocationSizeInBytes = 
getSizeFromCount(INITIAL_VALUE_ALLOCATION);
-  private int allocationMonitor = 0;
+public class BitVector extends BaseFixedWidthVector {
+  private final FieldReader reader;
 
+  /**
+   * Instantiate a BitVector. This doesn't allocate any memory for
+   * the data in vector.
+   *
+   * @param name  name of the vector
+   * @param allocator allocator for memory management.
+   */
   public BitVector(String name, BufferAllocator allocator) {
-super(name, allocator);
+this(name, FieldType.nullable(Types.MinorType.BIT.getType()),
+allocator);
   }
 
-  @Override
-  public void load(ArrowFieldNode fieldNode, ArrowBuf data) {
-// When the vector is all nulls or all defined, the content of the buffer 
can be omitted
-if (data.readableBytes() == 0 && fieldNode.getLength() != 0) {
-  int count = fieldNode.getLength();
-  allocateNew(count);
-  int n = getSizeFromCount(count);
-  if (fieldNode.getNullCount() == 0) {
-// all defined
-// create an all 1s buffer
-// set full bytes
-int fullBytesCount = count / 8;
-for (int i = 0; i < fullBytesCount; ++i) {
-  this.data.setByte(i, 0xFF);
-}
-int remainder = count % 8;
-// set remaining bits
-if (remainder > 0) {
-  byte bitMask = (byte) (0xFFL >>> ((8 - remainder) & 7));
-  this.data.setByte(fullBytesCount, bitMask);
-}
-  } else if (fieldNode.getNullCount() == fieldNode.getLength()) {
-// all null
-// create an all 0s buffer
-zeroVector();
-  } else {
-throw new IllegalArgumentException("The buffer can be empty only if 
there's no data or it's all null or all defined");
-  }
-  this.data.writerIndex(n);
-} else {
-  super.load(fieldNode, data);
-}
-this.valueCount = fieldNode.getLength();
+  /**
+   * Instantiate a BitVector. This doesn't allocate any memory for
+   * the data in vector.
+   *
+   * @param name  name of the vector
+   * @param fieldType type of Field materialized by this vector
+   * @param allocator allocator for memory management.
+   */
+  public BitVector(String name, FieldType fieldType, BufferAllocator 
allocator) {
+super(name, allocator, fieldType, (byte) 0);
+reader = new BitReaderImpl(BitVector.this);
   }
 
+  /**
+   * Get a reader that supports reading values from this vector
+   *
+   * @return Field Reader for this vector
+   */
   @Override
-  public Field getField() {
-throw new UnsupportedOperationException("internal vector");
+  public FieldReader getReader() {
+return reader;
   }
 
+  /**
+   * Get minor type for this vector. The vector holds values belonging
+   * to a particular type.
+   *