[GitHub] [arrow] nevi-me commented on a change in pull request #7365: ARROW-9007: [Rust] Support appending array data to builders

2020-06-06 Thread GitBox


nevi-me commented on a change in pull request #7365:
URL: https://github.com/apache/arrow/pull/7365#discussion_r436308395



##
File path: rust/arrow/src/array/builder.rs
##
@@ -841,12 +1048,91 @@ impl ArrayBuilder for StringBuilder {
 }
 }
 
+// Helper function for appending Binary and Utf8 data
+fn append_binary_data(
+builder:  ListBuilder,
+data_type: ,
+data: &[ArrayDataRef],
+) -> Result<()> {
+if !check_array_data_type(data_type, data) {
+return Err(ArrowError::InvalidArgumentError(
+"Cannot append data to builder if data types are 
different".to_string(),
+));
+}
+for array in data {
+// convert string to List to reuse list's cast
+let int_data = ()[1];
+let int_data = Arc::new(ArrayData::new(
+DataType::UInt8,
+int_data.len(),
+None,
+None,
+0,
+vec![int_data.clone()],
+vec![],
+)) as ArrayDataRef;
+let list_data = Arc::new(ArrayData::new(
+DataType::List(Box::new(DataType::UInt8)),
+array.len(),
+None,
+array.null_buffer().map(|buf| buf.clone()),
+array.offset(),
+vec![(()[0]).clone()],
+vec![int_data],
+));
+builder.append_data(&[list_data])?;
+}
+Ok(())
+}
+
 impl ArrayBuilder for FixedSizeBinaryBuilder {
 /// Returns the builder as a non-mutable `Any` reference.
 fn as_any() ->  {
 self
 }
 
+/// Appends data from other arrays into the builder
+///
+/// This is most useful when concatenating arrays of the same type into a 
builder.
+fn append_data( self, data: &[ArrayDataRef]) -> Result<()> {
+if !check_array_data_type(_type(), data) {
+return Err(ArrowError::InvalidArgumentError(
+"Cannot append data to builder if data types are 
different".to_string(),
+));
+}
+for array in data {
+// convert string to FixedSizeList to reuse list's append
+let int_data = ()[0];
+let int_data = Arc::new(ArrayData::new(
+DataType::UInt8,
+int_data.len(),
+None,
+None,
+0,
+vec![int_data.clone()],
+vec![],
+)) as ArrayDataRef;
+let list_data = Arc::new(ArrayData::new(
+DataType::FixedSizeList(Box::new(DataType::UInt8), 
self.builder.list_len),

Review comment:
   I don't understand you entirely here, mind clarifying?
   Here I'm reconstructing the data as a FSList from FSBinary, 
otherwise I'd get an error about data types not being the same.
   
   I used list_len directly because i32 is a Copy type. I can change to 
builder.value_length()





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nevi-me commented on a change in pull request #7365: ARROW-9007: [Rust] Support appending array data to builders

2020-06-06 Thread GitBox


nevi-me commented on a change in pull request #7365:
URL: https://github.com/apache/arrow/pull/7365#discussion_r436307832



##
File path: rust/arrow/src/array/builder.rs
##
@@ -577,6 +632,81 @@ where
 self
 }
 
+/// Appends data from other arrays into the builder
+///
+/// This is most useful when concatenating arrays of the same type into a 
builder.
+fn append_data( self, data: &[ArrayDataRef]) -> Result<()> {
+if !check_array_data_type(_type(), data) {
+return Err(ArrowError::InvalidArgumentError(
+"Cannot append data to builder if data types are 
different".to_string(),
+));
+}
+// determine the latest offset on the builder
+let mut cum_offset = if self.offsets_builder.len() == 0 {
+0
+} else {
+// peek into buffer to get last appended offset
+let buffer = self.offsets_builder.buffer.data();
+let len = self.offsets_builder.len();
+let (start, end) = ((len - 1) * 4, len * 4);
+let slice = [start..end];
+i32::from_le_bytes(slice.try_into().unwrap())
+};
+for array in data {
+if array.child_data().len() != 1 {

Review comment:
   The validation is only for data type, so we'd have to make a call on 
whether passing array data that's invalid should be undefined behaviour. If we 
passed in ArrayRef, we'd be certain that data is valid, but otherwise nothing 
stops someone from manually constructing ArrayDataRef incorrectly and passing 
it in. The validation check here at least give the user feedback, otherwise it 
would be a generic bounds error.
   
   I could alternatively customise the validation for different types, with 
potential allocation for both value and bitmap builders for primitive arrays. 
It becomes a slippery slope for lists and structs because those can be deeply 
nested.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou closed pull request #7366: ARROW-8781: [CI][MinGW] Enable ccache

2020-06-06 Thread GitBox


kou closed pull request #7366:
URL: https://github.com/apache/arrow/pull/7366


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #7366: ARROW-8781: [CI][MinGW] Enable ccache

2020-06-06 Thread GitBox


kou commented on pull request #7366:
URL: https://github.com/apache/arrow/pull/7366#issuecomment-640129138


   +1
   
   In "C GLib & Ruby / AMD64 Windows MinGW 64 GLib & Ruby": 6min -> 1min
   
   In "C++ / AMD64 Windows MinGW 32 C++": 18min -> 2min
   
   In "C++ / AMD64 Windows MinGW 64 C++": 18min -> 2min
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on a change in pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

2020-06-06 Thread GitBox


nealrichardson commented on a change in pull request #7318:
URL: https://github.com/apache/arrow/pull/7318#discussion_r436304103



##
File path: r/R/chunked-array.R
##
@@ -75,23 +75,15 @@ ChunkedArray <- R6Class("ChunkedArray", inherit = 
ArrowObject,
   if (is.integer(i)) {
 i <- Array$create(i)
   }
-  # Invalid: Kernel does not support chunked array arguments
-  # so use the old method for both cases
-  if (inherits(i, "ChunkedArray")) {
-return(shared_ptr(ChunkedArray, ChunkedArray__TakeChunked(self, i)))
-  }
-  assert_is(i, "Array")
-  return(shared_ptr(ChunkedArray, ChunkedArray__Take(self, i)))
+  # Invalid: Tried executing function with non-value type: ChunkedArray

Review comment:
   Done in 
https://github.com/apache/arrow/pull/7308/commits/247a1048e85871214e0456f3ee347a9b3e5388da





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] houqp commented on a change in pull request #7365: ARROW-9007: [Rust] Support appending array data to builders

2020-06-06 Thread GitBox


houqp commented on a change in pull request #7365:
URL: https://github.com/apache/arrow/pull/7365#discussion_r436303006



##
File path: rust/arrow/src/array/builder.rs
##
@@ -841,12 +1048,91 @@ impl ArrayBuilder for StringBuilder {
 }
 }
 
+// Helper function for appending Binary and Utf8 data
+fn append_binary_data(
+builder:  ListBuilder,
+data_type: ,
+data: &[ArrayDataRef],
+) -> Result<()> {
+if !check_array_data_type(data_type, data) {
+return Err(ArrowError::InvalidArgumentError(
+"Cannot append data to builder if data types are 
different".to_string(),
+));
+}
+for array in data {
+// convert string to List to reuse list's cast
+let int_data = ()[1];
+let int_data = Arc::new(ArrayData::new(
+DataType::UInt8,
+int_data.len(),
+None,
+None,
+0,
+vec![int_data.clone()],
+vec![],
+)) as ArrayDataRef;
+let list_data = Arc::new(ArrayData::new(
+DataType::List(Box::new(DataType::UInt8)),
+array.len(),
+None,
+array.null_buffer().map(|buf| buf.clone()),
+array.offset(),
+vec![(()[0]).clone()],
+vec![int_data],
+));
+builder.append_data(&[list_data])?;
+}
+Ok(())
+}
+
 impl ArrayBuilder for FixedSizeBinaryBuilder {
 /// Returns the builder as a non-mutable `Any` reference.
 fn as_any() ->  {
 self
 }
 
+/// Appends data from other arrays into the builder
+///
+/// This is most useful when concatenating arrays of the same type into a 
builder.
+fn append_data( self, data: &[ArrayDataRef]) -> Result<()> {
+if !check_array_data_type(_type(), data) {
+return Err(ArrowError::InvalidArgumentError(
+"Cannot append data to builder if data types are 
different".to_string(),
+));
+}
+for array in data {
+// convert string to FixedSizeList to reuse list's append
+let int_data = ()[0];
+let int_data = Arc::new(ArrayData::new(
+DataType::UInt8,
+int_data.len(),
+None,
+None,
+0,
+vec![int_data.clone()],
+vec![],
+)) as ArrayDataRef;
+let list_data = Arc::new(ArrayData::new(
+DataType::FixedSizeList(Box::new(DataType::UInt8), 
self.builder.list_len),

Review comment:
   do we need to validate list_len for each ArrayData as well? Also I 
recommend calling value_length() method to get list_len value instead or remove 
value_length entirely and use list_len directly everywhere.

##
File path: rust/arrow/src/array/builder.rs
##
@@ -1018,6 +1304,48 @@ impl ArrayBuilder for StructBuilder {
 self.len
 }
 
+/// Appends data from other arrays into the builder
+///
+/// This is most useful when concatenating arrays of the same type into a 
builder.
+fn append_data( self, data: &[ArrayDataRef]) -> Result<()> {
+if !check_array_data_type(_type(), data) {
+return Err(ArrowError::InvalidArgumentError(
+"Cannot append data to builder if data types are 
different".to_string(),
+));
+}
+for array in data {
+let len = array.len();
+if len == 0 {
+continue;
+}
+let offset = array.offset();
+for (builder, child_data) in self
+.field_builders
+.iter_mut()
+.zip(array.child_data().iter())
+{
+// slice child_data to account for offsets
+let child_array = make_array(child_data.clone());
+let sliced = child_array.slice(offset, len);
+builder.append_data(&[sliced.data()])?;
+}
+// append array length
+self.len += len;
+for i in 0..len {
+// account for offset as `ArrayData` does not
+self.bitmap_builder.append(array.is_valid(offset + i))?;

Review comment:
   looks like we missed the reserve call before the loop here.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7366: ARROW-8781: [CI][MinGW] Enable ccache

2020-06-06 Thread GitBox


github-actions[bot] commented on pull request #7366:
URL: https://github.com/apache/arrow/pull/7366#issuecomment-640122277


   https://issues.apache.org/jira/browse/ARROW-8781



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou opened a new pull request #7366: ARROW-8781: [CI][MinGW] Enable ccache

2020-06-06 Thread GitBox


kou opened a new pull request #7366:
URL: https://github.com/apache/arrow/pull/7366


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7126: [C++][Python] Remove boost_regex dependency

2020-06-06 Thread GitBox


nealrichardson commented on pull request #7126:
URL: https://github.com/apache/arrow/pull/7126#issuecomment-640121051


   FTR Thrift 0.13 (and current master) still requires this boost header: 
https://github.com/apache/thrift/blob/master/lib/cpp/src/thrift/transport/TTransportException.h#L23
   
   Can we just vendor it?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] houqp commented on a change in pull request #7365: ARROW-9007: [Rust] Support appending array data to builders

2020-06-06 Thread GitBox


houqp commented on a change in pull request #7365:
URL: https://github.com/apache/arrow/pull/7365#discussion_r436301311



##
File path: rust/arrow/src/array/builder.rs
##
@@ -577,6 +632,81 @@ where
 self
 }
 
+/// Appends data from other arrays into the builder
+///
+/// This is most useful when concatenating arrays of the same type into a 
builder.
+fn append_data( self, data: &[ArrayDataRef]) -> Result<()> {
+if !check_array_data_type(_type(), data) {
+return Err(ArrowError::InvalidArgumentError(
+"Cannot append data to builder if data types are 
different".to_string(),
+));
+}
+// determine the latest offset on the builder
+let mut cum_offset = if self.offsets_builder.len() == 0 {
+0
+} else {
+// peek into buffer to get last appended offset
+let buffer = self.offsets_builder.buffer.data();
+let len = self.offsets_builder.len();
+let (start, end) = ((len - 1) * 4, len * 4);
+let slice = [start..end];
+i32::from_le_bytes(slice.try_into().unwrap())
+};
+for array in data {
+if array.child_data().len() != 1 {

Review comment:
   If we are already doing validation before mutating data (which I think 
it's the right behavior), it's best to move all validation logic into the 
initial validation loop.
   
   Another thing we can move into the initial loop is stats gathering for 
optimization purpose. For example, we can calculate aggregated element count in 
the loop. Then before we enter the mutation loop, call reserve method on 
various builders to allocate memory in one go. This way, we don't have to 
trigger multiple memory reallocation in the mutation loop.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] houqp commented on a change in pull request #7365: ARROW-9007: [Rust] Support appending array data to builders

2020-06-06 Thread GitBox


houqp commented on a change in pull request #7365:
URL: https://github.com/apache/arrow/pull/7365#discussion_r436301311



##
File path: rust/arrow/src/array/builder.rs
##
@@ -577,6 +632,81 @@ where
 self
 }
 
+/// Appends data from other arrays into the builder
+///
+/// This is most useful when concatenating arrays of the same type into a 
builder.
+fn append_data( self, data: &[ArrayDataRef]) -> Result<()> {
+if !check_array_data_type(_type(), data) {
+return Err(ArrowError::InvalidArgumentError(
+"Cannot append data to builder if data types are 
different".to_string(),
+));
+}
+// determine the latest offset on the builder
+let mut cum_offset = if self.offsets_builder.len() == 0 {
+0
+} else {
+// peek into buffer to get last appended offset
+let buffer = self.offsets_builder.buffer.data();
+let len = self.offsets_builder.len();
+let (start, end) = ((len - 1) * 4, len * 4);
+let slice = [start..end];
+i32::from_le_bytes(slice.try_into().unwrap())
+};
+for array in data {
+if array.child_data().len() != 1 {

Review comment:
   If we are already doing validation before mutating data (which I think 
it's the right behavior), it's best to move all validation logic into the 
initial validation loop.
   
   Another thing we can move into the initial loop is stats gathering for 
optimization purpose. For example, we can calculate aggregated element count in 
the loop. Then before we enter the mutation loop, call reserve method on 
various builders to allocate memory in one go. This way, we don't have to 
trigger multiple memory reallocation in the mutation loop, especially for 
bitmap_builder.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm closed pull request #7126: [C++][Python] Remove boost_regex dependency

2020-06-06 Thread GitBox


wesm closed pull request #7126:
URL: https://github.com/apache/arrow/pull/7126


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7126: [C++][Python] Remove boost_regex dependency

2020-06-06 Thread GitBox


wesm commented on pull request #7126:
URL: https://github.com/apache/arrow/pull/7126#issuecomment-640060481


   Alright. We could still remove the boost::regex dependency since Thrift does 
not require that, but I'll close this for now



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #6316: ARROW-7717: [CI] Have nightly integration test for Spark's latest release

2020-06-06 Thread GitBox


wesm commented on pull request #6316:
URL: https://github.com/apache/arrow/pull/6316#issuecomment-640059970


   @BryanCutler @kszucs what do you think about setting up a Spark 3.x 
integration test?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #6134: [Python][WIP] Refactor python builtins conversion

2020-06-06 Thread GitBox


wesm commented on pull request #6134:
URL: https://github.com/apache/arrow/pull/6134#issuecomment-640059884


   Closing this, please reopen when you have something you want reviewed



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm closed pull request #6134: [Python][WIP] Refactor python builtins conversion

2020-06-06 Thread GitBox


wesm closed pull request #6134:
URL: https://github.com/apache/arrow/pull/6134


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm closed pull request #7340: ARROW-8979: [C++] Refine bitmap unaligned word access

2020-06-06 Thread GitBox


wesm closed pull request #7340:
URL: https://github.com/apache/arrow/pull/7340


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nevi-me commented on pull request #7365: ARROW-9007: [Rust] Support appending array data to builders

2020-06-06 Thread GitBox


nevi-me commented on pull request #7365:
URL: https://github.com/apache/arrow/pull/7365#issuecomment-640031722


   @houqp I've pushed some changes



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nevi-me commented on a change in pull request #7365: ARROW-9007: [Rust] Support appending array data to builders

2020-06-06 Thread GitBox


nevi-me commented on a change in pull request #7365:
URL: https://github.com/apache/arrow/pull/7365#discussion_r436257140



##
File path: rust/arrow/src/array/builder.rs
##
@@ -1018,6 +1271,50 @@ impl ArrayBuilder for StructBuilder {
 self.len
 }
 
+/// Appends data from other arrays into the builder
+///
+/// This is most useful when concatenating arrays of the same type into a 
builder.
+fn append_data( self, data: &[ArrayDataRef]) -> Result<()> {
+for array in data {
+if let DataType::Struct(fields) = array.data_type() {
+if  != fields {
+return Err(ArrowError::InvalidArgumentError(
+"Struct arrays are not the same".to_string(),
+));
+}
+let len = array.len();
+if len == 0 {
+continue;
+}
+let offset = array.offset();
+let results: Result> = self
+.field_builders
+.iter_mut()
+.zip(array.child_data())
+.map(|(builder, child_data)| {
+// slice child_data to account for offsets
+let child_array = make_array(child_data.clone());
+let sliced = child_array.slice(offset, len);
+builder.append_data(&[sliced.data()])
+})
+.collect();
+results?;

Review comment:
   Thanks, I've changed it





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nevi-me commented on a change in pull request #7365: ARROW-9007: [Rust] Support appending array data to builders

2020-06-06 Thread GitBox


nevi-me commented on a change in pull request #7365:
URL: https://github.com/apache/arrow/pull/7365#discussion_r436254430



##
File path: rust/arrow/src/array/builder.rs
##
@@ -450,6 +455,44 @@ impl ArrayBuilder for 
PrimitiveBuilder {
 self.values_builder.len
 }
 
+/// Appends data from other arrays into the builder
+///
+/// This is most useful when concatenating arrays of the same type into a 
builder.
+fn append_data( self, data: &[ArrayDataRef]) -> Result<()> {
+let mul = T::get_bit_width() / 8;
+for array in data {
+if array.data_type() != ::get_data_type() {

Review comment:
   I considered this, I ideally wanted to leave the responsibility to the 
user, as we'd provide convenience methods like concat, but since this is 
public, I should check data types first, then return early if there's a 
mismatch. I'll update





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] houqp commented on a change in pull request #7324: ARROW-9005: [Rust] [Datafusion] support sort expression

2020-06-06 Thread GitBox


houqp commented on a change in pull request #7324:
URL: https://github.com/apache/arrow/pull/7324#discussion_r436252687



##
File path: rust/arrow/src/compute/kernels/concat.rs
##
@@ -0,0 +1,395 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Defines concat kernel for `ArrayRef`
+//!
+//! Example:
+//!
+//! ```
+//! use std::sync::Arc;
+//! use arrow::array::{ArrayRef, StringArray};
+//! use arrow::compute::concat;
+//!
+//! let arr = concat(![
+//! Arc::new(StringArray::from(vec!["hello", "world"])) as ArrayRef,
+//! Arc::new(StringArray::from(vec!["!"])) as ArrayRef,
+//! ]).unwrap();
+//! assert_eq!(arr.len(), 3);
+//! ```
+
+use std::sync::Arc;
+
+use crate::array::*;
+use crate::datatypes::*;
+use crate::error::{ArrowError, Result};
+
+use TimeUnit::*;
+
+/// Concatenate multiple `ArrayRef` with the same type.
+///
+/// Returns a new ArrayRef.
+pub fn concat(array_list: ) -> Result {
+let mut data_type: Option = None;
+let array_data_list = _list
+.iter()
+.map(|a| {
+let array_data = a.data_ref().clone();
+let curr_data_type = array_data.data_type().clone();
+match _type {
+Some(t) => {
+if t != _data_type {
+return Err(ArrowError::ComputeError(
+"Cannot concat arrays with data type".to_string(),
+));
+}
+}
+None => {
+data_type = Some(curr_data_type);
+}
+}
+Ok(array_data)
+})
+.collect::>>()?;
+
+let data_type = match data_type {
+None => {
+return Err(ArrowError::ComputeError(
+"Cannot concat 0 array".to_string(),
+));
+}
+Some(t) => t,
+};
+match data_type {
+DataType::Utf8 => concat_raw_string(array_data_list),
+DataType::Boolean => concat_primitive::(array_data_list),
+DataType::Int8 => concat_raw_primitive::(array_data_list),
+DataType::Int16 => concat_raw_primitive::(array_data_list),
+DataType::Int32 => concat_raw_primitive::(array_data_list),
+DataType::Int64 => concat_raw_primitive::(array_data_list),
+DataType::UInt8 => concat_raw_primitive::(array_data_list),
+DataType::UInt16 => 
concat_raw_primitive::(array_data_list),
+DataType::UInt32 => 
concat_raw_primitive::(array_data_list),
+DataType::UInt64 => 
concat_raw_primitive::(array_data_list),
+DataType::Float32 => 
concat_raw_primitive::(array_data_list),
+DataType::Float64 => 
concat_raw_primitive::(array_data_list),
+DataType::Date32(_) => 
concat_raw_primitive::(array_data_list),
+DataType::Date64(_) => 
concat_raw_primitive::(array_data_list),
+DataType::Time32(Second) => {
+concat_raw_primitive::(array_data_list)
+}
+DataType::Time32(Millisecond) => {
+concat_raw_primitive::(array_data_list)
+}
+DataType::Time64(Microsecond) => {
+concat_raw_primitive::(array_data_list)
+}
+DataType::Time64(Nanosecond) => {
+concat_raw_primitive::(array_data_list)
+}
+DataType::Timestamp(Second, _) => {
+concat_raw_primitive::(array_data_list)
+}
+DataType::Timestamp(Millisecond, _) => {
+concat_raw_primitive::(array_data_list)
+}
+DataType::Timestamp(Microsecond, _) => {
+concat_raw_primitive::(array_data_list)
+}
+DataType::Timestamp(Nanosecond, _) => {
+concat_raw_primitive::(array_data_list)
+}
+DataType::Interval(IntervalUnit::YearMonth) => {
+concat_raw_primitive::(array_data_list)
+}
+DataType::Interval(IntervalUnit::DayTime) => {
+concat_raw_primitive::(array_data_list)
+}
+DataType::Duration(TimeUnit::Second) => {
+concat_raw_primitive::(array_data_list)
+}
+DataType::Duration(TimeUnit::Millisecond) => {
+concat_raw_primitive::(array_data_list)
+  

[GitHub] [arrow] houqp commented on a change in pull request #7365: ARROW-9007: [Rust] Support appending array data to builders

2020-06-06 Thread GitBox


houqp commented on a change in pull request #7365:
URL: https://github.com/apache/arrow/pull/7365#discussion_r436250767



##
File path: rust/arrow/src/array/builder.rs
##
@@ -577,6 +620,78 @@ where
 self
 }
 
+/// Appends data from other arrays into the builder
+///
+/// This is most useful when concatenating arrays of the same type into a 
builder.
+fn append_data( self, data: &[ArrayDataRef]) -> Result<()> {
+// determine the latest offset on the builder
+let mut cum_offset = if self.offsets_builder.len() == 0 {
+0
+} else {
+// peek into buffer to get last appended offset
+let buffer = self.offsets_builder.buffer.data();
+let len = self.offsets_builder.len();
+let (start, end) = ((len - 1) * 4, len * 4);
+let slice = [start..end];
+i32::from_le_bytes(slice.try_into().unwrap())
+};
+for array in data {
+if let DataType::List(_) = array.data_type() {
+if array.child_data().len() != 1 {
+return Err(ArrowError::InvalidArgumentError(
+"When appending list arrays, data must contain 1 
child_data element"
+.to_string(),
+));
+}
+let len = array.len();
+if len == 0 {
+continue;
+}
+let offset = array.offset();
+
+// `typed_data` is unsafe, however this call is safe as 
`ListArray` has i32 offsets
+unsafe {

Review comment:
   looks like scope of this unsafe block can be reduced?

##
File path: rust/arrow/src/array/builder.rs
##
@@ -450,6 +455,44 @@ impl ArrayBuilder for 
PrimitiveBuilder {
 self.values_builder.len
 }
 
+/// Appends data from other arrays into the builder
+///
+/// This is most useful when concatenating arrays of the same type into a 
builder.
+fn append_data( self, data: &[ArrayDataRef]) -> Result<()> {
+let mul = T::get_bit_width() / 8;
+for array in data {
+if array.data_type() != ::get_data_type() {

Review comment:
   if one of the array data has the wrong type, the append will be 
partially applied, is this intentional?

##
File path: rust/arrow/src/array/builder.rs
##
@@ -1018,6 +1271,50 @@ impl ArrayBuilder for StructBuilder {
 self.len
 }
 
+/// Appends data from other arrays into the builder
+///
+/// This is most useful when concatenating arrays of the same type into a 
builder.
+fn append_data( self, data: &[ArrayDataRef]) -> Result<()> {
+for array in data {
+if let DataType::Struct(fields) = array.data_type() {
+if  != fields {
+return Err(ArrowError::InvalidArgumentError(
+"Struct arrays are not the same".to_string(),

Review comment:
   nitpick, perhaps `have different fields` would be a better wording here.

##
File path: rust/arrow/src/array/builder.rs
##
@@ -1018,6 +1271,50 @@ impl ArrayBuilder for StructBuilder {
 self.len
 }
 
+/// Appends data from other arrays into the builder
+///
+/// This is most useful when concatenating arrays of the same type into a 
builder.
+fn append_data( self, data: &[ArrayDataRef]) -> Result<()> {
+for array in data {
+if let DataType::Struct(fields) = array.data_type() {
+if  != fields {
+return Err(ArrowError::InvalidArgumentError(
+"Struct arrays are not the same".to_string(),
+));
+}
+let len = array.len();
+if len == 0 {
+continue;
+}
+let offset = array.offset();
+let results: Result> = self
+.field_builders
+.iter_mut()
+.zip(array.child_data())
+.map(|(builder, child_data)| {
+// slice child_data to account for offsets
+let child_array = make_array(child_data.clone());
+let sliced = child_array.slice(offset, len);
+builder.append_data(&[sliced.data()])
+})
+.collect();
+results?;

Review comment:
   minor, if results is not used, then it's better to use a for loop over 
zipped iterators above to avoid creating a temporary  vector at the end.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure 

[GitHub] [arrow] cyb70289 commented on pull request #7340: ARROW-8979: [C++] Refine bitmap unaligned word access

2020-06-06 Thread GitBox


cyb70289 commented on pull request #7340:
URL: https://github.com/apache/arrow/pull/7340#issuecomment-640014574


   rebased



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou closed pull request #6578: ARROW-7371: [GLib] Add GLib binding of Dataset

2020-06-06 Thread GitBox


kou closed pull request #6578:
URL: https://github.com/apache/arrow/pull/6578


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7319: ARROW-8289: [Rust] Parquet Arrow writer with nested support

2020-06-06 Thread GitBox


github-actions[bot] commented on pull request #7319:
URL: https://github.com/apache/arrow/pull/7319#issuecomment-640010989


   https://issues.apache.org/jira/browse/ARROW-8289



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mrkn commented on pull request #6578: ARROW-7371: [GLib] Add GLib binding of Dataset

2020-06-06 Thread GitBox


mrkn commented on pull request #6578:
URL: https://github.com/apache/arrow/pull/6578#issuecomment-640010351


   @kou thank you very much!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nevi-me closed pull request #7359: ARROW-8723: [Rust] Remove SIMD specific benchmark code

2020-06-06 Thread GitBox


nevi-me closed pull request #7359:
URL: https://github.com/apache/arrow/pull/7359


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nevi-me closed pull request #7360: ARROW-9047: [Rust] Fix a segfault when setting zero bits in a zero-length bitset.

2020-06-06 Thread GitBox


nevi-me closed pull request #7360:
URL: https://github.com/apache/arrow/pull/7360


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] houqp commented on pull request #7324: ARROW-9005: [Rust] [Datafusion] support sort expression

2020-06-06 Thread GitBox


houqp commented on pull request #7324:
URL: https://github.com/apache/arrow/pull/7324#issuecomment-640006429


   @andygrove sent PR to sqlparser: 
https://github.com/andygrove/sqlparser-rs/pull/185.
   
   I recommend we focus on reviewing and getting 
https://github.com/apache/arrow/pull/7365 merged first. Then I can refactor 
this PR to use append_data methods.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nevi-me commented on a change in pull request #7324: ARROW-9005: [Rust] [Datafusion] support sort expression

2020-06-06 Thread GitBox


nevi-me commented on a change in pull request #7324:
URL: https://github.com/apache/arrow/pull/7324#discussion_r436242734



##
File path: rust/arrow/src/compute/kernels/concat.rs
##
@@ -0,0 +1,395 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Defines concat kernel for `ArrayRef`
+//!
+//! Example:
+//!
+//! ```
+//! use std::sync::Arc;
+//! use arrow::array::{ArrayRef, StringArray};
+//! use arrow::compute::concat;
+//!
+//! let arr = concat(![
+//! Arc::new(StringArray::from(vec!["hello", "world"])) as ArrayRef,
+//! Arc::new(StringArray::from(vec!["!"])) as ArrayRef,
+//! ]).unwrap();
+//! assert_eq!(arr.len(), 3);
+//! ```
+
+use std::sync::Arc;
+
+use crate::array::*;
+use crate::datatypes::*;
+use crate::error::{ArrowError, Result};
+
+use TimeUnit::*;
+
+/// Concatenate multiple `ArrayRef` with the same type.
+///
+/// Returns a new ArrayRef.
+pub fn concat(array_list: ) -> Result {
+let mut data_type: Option = None;
+let array_data_list = _list
+.iter()
+.map(|a| {
+let array_data = a.data_ref().clone();
+let curr_data_type = array_data.data_type().clone();
+match _type {
+Some(t) => {
+if t != _data_type {
+return Err(ArrowError::ComputeError(
+"Cannot concat arrays with data type".to_string(),
+));
+}
+}
+None => {
+data_type = Some(curr_data_type);
+}
+}
+Ok(array_data)
+})
+.collect::>>()?;
+
+let data_type = match data_type {
+None => {
+return Err(ArrowError::ComputeError(
+"Cannot concat 0 array".to_string(),
+));
+}
+Some(t) => t,
+};
+match data_type {
+DataType::Utf8 => concat_raw_string(array_data_list),
+DataType::Boolean => concat_primitive::(array_data_list),
+DataType::Int8 => concat_raw_primitive::(array_data_list),
+DataType::Int16 => concat_raw_primitive::(array_data_list),
+DataType::Int32 => concat_raw_primitive::(array_data_list),
+DataType::Int64 => concat_raw_primitive::(array_data_list),
+DataType::UInt8 => concat_raw_primitive::(array_data_list),
+DataType::UInt16 => 
concat_raw_primitive::(array_data_list),
+DataType::UInt32 => 
concat_raw_primitive::(array_data_list),
+DataType::UInt64 => 
concat_raw_primitive::(array_data_list),
+DataType::Float32 => 
concat_raw_primitive::(array_data_list),
+DataType::Float64 => 
concat_raw_primitive::(array_data_list),
+DataType::Date32(_) => 
concat_raw_primitive::(array_data_list),
+DataType::Date64(_) => 
concat_raw_primitive::(array_data_list),
+DataType::Time32(Second) => {
+concat_raw_primitive::(array_data_list)
+}
+DataType::Time32(Millisecond) => {
+concat_raw_primitive::(array_data_list)
+}
+DataType::Time64(Microsecond) => {
+concat_raw_primitive::(array_data_list)
+}
+DataType::Time64(Nanosecond) => {
+concat_raw_primitive::(array_data_list)
+}
+DataType::Timestamp(Second, _) => {
+concat_raw_primitive::(array_data_list)
+}
+DataType::Timestamp(Millisecond, _) => {
+concat_raw_primitive::(array_data_list)
+}
+DataType::Timestamp(Microsecond, _) => {
+concat_raw_primitive::(array_data_list)
+}
+DataType::Timestamp(Nanosecond, _) => {
+concat_raw_primitive::(array_data_list)
+}
+DataType::Interval(IntervalUnit::YearMonth) => {
+concat_raw_primitive::(array_data_list)
+}
+DataType::Interval(IntervalUnit::DayTime) => {
+concat_raw_primitive::(array_data_list)
+}
+DataType::Duration(TimeUnit::Second) => {
+concat_raw_primitive::(array_data_list)
+}
+DataType::Duration(TimeUnit::Millisecond) => {
+concat_raw_primitive::(array_data_list)

[GitHub] [arrow] nevi-me commented on pull request #7365: ARROW-9007: [Rust] Support appending array data to builders

2020-06-06 Thread GitBox


nevi-me commented on pull request #7365:
URL: https://github.com/apache/arrow/pull/7365#issuecomment-639990231


   @houqp I finally completed this. The lists and nested lists were quite 
tricky, but I managed in the end.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7365: ARROW-9007: [Rust] Support appending array data to builders

2020-06-06 Thread GitBox


github-actions[bot] commented on pull request #7365:
URL: https://github.com/apache/arrow/pull/7365#issuecomment-639990238


   https://issues.apache.org/jira/browse/ARROW-9007



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nevi-me opened a new pull request #7365: ARROW-9007: [Rust] Support appending array data to builders

2020-06-06 Thread GitBox


nevi-me opened a new pull request #7365:
URL: https://github.com/apache/arrow/pull/7365


   This enables appending `ArrayDataRef` to builders, which makes it easier to 
concatenate arrays



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org