[GitHub] [arrow] jianxind commented on pull request #7314: ARROW-8996: [C++] runtime support for aggregate sum dense kernel

2020-06-15 Thread GitBox


jianxind commented on pull request #7314:
URL: https://github.com/apache/arrow/pull/7314#issuecomment-643950633


   @ursabot benchmark --suite-filter=arrow-compute-aggregate-benchmark



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #7423: ARROW-8942: [R] Detect compression in reading CSV/JSON

2020-06-15 Thread GitBox


pitrou commented on a change in pull request #7423:
URL: https://github.com/apache/arrow/pull/7423#discussion_r439977192



##
File path: r/tests/testthat/test-csv.R
##
@@ -168,3 +168,15 @@ test_that("read_csv_arrow() respects col_select", {
   tib <- read_csv_arrow(tf, col_select = starts_with("Sepal"), as_data_frame = 
TRUE)
   expect_equal(tib, tibble::tibble(Sepal.Length = iris$Sepal.Length, 
Sepal.Width = iris$Sepal.Width))
 })
+
+test_that("read_csv_arrow() respects col_select", {

Review comment:
   Update test description?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ursabot commented on pull request #7314: ARROW-8996: [C++] runtime support for aggregate sum dense kernel

2020-06-15 Thread GitBox


ursabot commented on pull request #7314:
URL: https://github.com/apache/arrow/pull/7314#issuecomment-643959192


   [AMD64 Ubuntu 18.04 C++ Benchmark 
(#111968)](https://ci.ursalabs.org/#builders/73/builds/74) builder has been 
succeeded.
   
   Revision: f0f0fc1c44a1e79970afea97a540bc44ffbc9d66
   
   ```diff
 ===  ==  ===  
 benchmarkbaselinecontenderchange
 ===  ==  ===  
 SumKernelDouble/32768/10 8.514 GiB/sec   11.180 GiB/sec   31.311%
 SumKernelFloat/32768/100 4.818 GiB/sec   6.487 GiB/sec34.653%
 SumKernelInt64/32768/1   11.216 GiB/sec  17.700 GiB/sec   57.811%
   - SumKernelInt8/32768/10   1.516 GiB/sec   647.895 MiB/sec  -58.274%
 SumKernelDouble/32768/0  10.584 GiB/sec  16.200 GiB/sec   53.064%
 SumKernelDouble/32768/1009.574 GiB/sec   11.165 GiB/sec   16.611%
 SumKernelInt64/32768/2   8.408 GiB/sec   11.962 GiB/sec   42.275%
 SumKernelInt64/32768/100 10.523 GiB/sec  11.937 GiB/sec   13.440%
   - SumKernelInt16/32768/1   4.707 GiB/sec   1.679 GiB/sec-64.334%
   - SumKernelInt16/32768/2   3.029 GiB/sec   1.681 GiB/sec-44.494%
 SumKernelInt32/32768/2   5.340 GiB/sec   6.837 GiB/sec28.049%
   - SumKernelInt8/32768/100  2.296 GiB/sec   648.265 MiB/sec  -72.424%
 SumKernelInt32/32768/1   5.242 GiB/sec   12.554 GiB/sec   139.507%
   - SumKernelInt8/32768/11.670 GiB/sec   650.501 MiB/sec  -61.956%
 SumKernelInt16/32768/0   2.917 GiB/sec   8.116 GiB/sec178.227%
   - SumKernelInt8/32768/21.633 GiB/sec   647.953 MiB/sec  -61.258%
   - SumKernelInt8/32768/12.647 GiB/sec   648.361 MiB/sec  -76.083%
 SumKernelInt32/32768/0   5.237 GiB/sec   12.586 GiB/sec   140.334%
 SumKernelInt32/32768/10  6.134 GiB/sec   6.844 GiB/sec11.577%
   - SumKernelInt16/32768/100 3.937 GiB/sec   1.677 GiB/sec-57.406%
 SumKernelInt8/32768/01.772 GiB/sec   4.985 GiB/sec181.275%
 SumKernelInt32/32768/1   5.466 GiB/sec   7.039 GiB/sec28.774%
 SumKernelInt32/32768/100 7.199 GiB/sec   6.858 GiB/sec-4.747%
 SumKernelInt64/32768/1   8.591 GiB/sec   12.356 GiB/sec   43.822%
 SumKernelFloat/32768/1   6.387 GiB/sec   11.708 GiB/sec   83.320%
 SumKernelDouble/32768/1  8.292 GiB/sec   11.524 GiB/sec   38.977%
 SumKernelFloat/32768/0   6.367 GiB/sec   11.597 GiB/sec   82.144%
 SumKernelInt64/32768/10  9.388 GiB/sec   11.948 GiB/sec   27.258%
 SumKernelDouble/32768/2  7.300 GiB/sec   11.174 GiB/sec   53.070%
 SumKernelFloat/32768/10  4.043 GiB/sec   6.488 GiB/sec60.492%
 SumKernelInt64/32768/0   11.229 GiB/sec  17.780 GiB/sec   58.336%
   - SumKernelInt16/32768/1   3.098 GiB/sec   1.696 GiB/sec-45.248%
   - SumKernelInt16/32768/10  3.364 GiB/sec   1.678 GiB/sec-50.122%
 SumKernelFloat/32768/2   2.474 GiB/sec   6.453 GiB/sec160.789%
 SumKernelDouble/32768/1  10.535 GiB/sec  16.165 GiB/sec   53.444%
 SumKernelFloat/32768/1   3.863 GiB/sec   6.663 GiB/sec72.458%
 ===  ==  ===  
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] maartenbreddels opened a new pull request #7434: ARROW-9131: [C++] Faster ascii_lower and ascii_upper.

2020-06-15 Thread GitBox


maartenbreddels opened a new pull request #7434:
URL: https://github.com/apache/arrow/pull/7434


   Following up on #7418 I tried and benchmarked a different way for
* ascii_lower
* ascii_upper
   
   Before (lower is similar):
   ```
   --
   Benchmark   Time   CPU Iterations
   --
   AsciiUpper_median4922843 ns  4918961 ns   10 
bytes_per_second=3.1457G/s items_per_second=213.17M/s
   ```
   
   After:
   ```
   --
   Benchmark   Time   CPU Iterations
   --
   AsciiUpper_median1391272 ns  1390014 ns   10 
bytes_per_second=11.132G/s items_per_second=754.363M/s
   
   ```
   
   This is a 3.7x speedup (on a AMD machine).
   
   Using http://quick-bench.com/JaDErmVCY23Z1tu6YZns_KBt0qU I found 4.6x 
speedup for clang 9, 6.4x for GCC 9.2.
   
   Also, the test is expanded a bit to include a non-ascii codepoint, to make 
explicit it is fine to upper
   or lower case a utf8 string. The non-overlap encoding of utf8 make this ok 
(see section 2.5 of Unicode
   Standard Core Specification v13.0).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7434: ARROW-9131: [C++] Faster ascii_lower and ascii_upper.

2020-06-15 Thread GitBox


github-actions[bot] commented on pull request #7434:
URL: https://github.com/apache/arrow/pull/7434#issuecomment-644017246


   https://issues.apache.org/jira/browse/ARROW-9131



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] zhztheplayer commented on a change in pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API by JNI to C++

2020-06-15 Thread GitBox


zhztheplayer commented on a change in pull request #7030:
URL: https://github.com/apache/arrow/pull/7030#discussion_r440053169



##
File path: java/dataset/src/main/java/org/apache/arrow/memory/Ownerships.java
##
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.memory;
+
+/**
+ * Utility class managing ownership's transferring between Native Arrow 
buffers and Java Arrow buffers.
+ */
+public class Ownerships {
+  private static final Ownerships INSTANCE = new Ownerships();
+
+  private Ownerships() {
+  }
+
+  public static Ownerships get() {
+return INSTANCE;
+  }
+
+  /**
+   * Returned ledger's ref count is initialized or increased by 1.
+   */
+  public BufferLedger takeOwnership(BaseAllocator allocator, AllocationManager 
allocationManager) {
+final BufferLedger ledger = allocationManager.associate(allocator);
+long size = allocationManager.getSize();
+boolean succeed = allocator.forceAllocate(size);
+if (!succeed) {
+  throw new OutOfMemoryException("Target allocator is full");
+}
+((BaseAllocator) 
allocationManager.getOwningLedger().getAllocator()).releaseBytes(size);

Review comment:
   After going though codes again I think it should be enough to use 
Reservation.reserve(int) to pre-allocate these the bytes needed (see newest 
commit). And it's not necessary to provide the transferring functionality 
independently as we don't have to use different allocators during creating the 
buffer.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] zhztheplayer commented on a change in pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API by JNI to C++

2020-06-15 Thread GitBox


zhztheplayer commented on a change in pull request #7030:
URL: https://github.com/apache/arrow/pull/7030#discussion_r440075959



##
File path: 
java/dataset/src/main/java/org/apache/arrow/memory/NativeUnderlingMemory.java
##
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.memory;
+
+import org.apache.arrow.dataset.jni.JniWrapper;
+
+/**
+ * AllocationManager implementation for Native allocated memory.
+ */
+public class NativeUnderlingMemory extends AllocationManager {

Review comment:
   Hi @jacques-n,
   
   Thanks for your suggestion but I think the "weaknesses" of current 
implementation might be somehow overrated. Regarding the weaknesses, it's of 
course urgent for orc adaptor to improve its memory allocation process because 
all of the allocated memory by orc is being escaped from Java allocator. 
However we don't have the same problem here in Datasets, allocator is going to 
manage all the native memory. The only issue we may have is that OOM error 
might be thrown after a buffer is allocated rather than before we create it. 
But because OOM will be finally thrown and over-allocated memory will be 
finally released, the impact will be very very small.
   
   And I knew we can possibly change to allocate Java memory directly from C++ 
via JNI, but codes will become messy and performance will turn to worse. That 
is why I didn't propose such a solution in this PR.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] zhztheplayer commented on a change in pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API by JNI to C++

2020-06-15 Thread GitBox


zhztheplayer commented on a change in pull request #7030:
URL: https://github.com/apache/arrow/pull/7030#discussion_r440076921



##
File path: 
java/dataset/src/main/java/org/apache/arrow/dataset/jni/NativeScanner.java
##
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.dataset.jni;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.NoSuchElementException;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.stream.Collectors;
+
+import org.apache.arrow.dataset.scanner.ScanTask;
+import org.apache.arrow.dataset.scanner.Scanner;
+import org.apache.arrow.memory.ArrowBuf;
+import org.apache.arrow.memory.BufferAllocator;
+import org.apache.arrow.memory.BufferLedger;
+import org.apache.arrow.memory.NativeUnderlingMemory;
+import org.apache.arrow.memory.Ownerships;
+import org.apache.arrow.vector.ipc.message.ArrowFieldNode;
+import org.apache.arrow.vector.ipc.message.ArrowRecordBatch;
+import org.apache.arrow.vector.types.pojo.Schema;
+import org.apache.arrow.vector.util.SchemaUtility;
+
+/**
+ * Native implementation of {@link Scanner}. Note that it currently emits only 
a single scan task of type
+ * {@link NativeScanTask}, which is internally a combination of all scan task 
instances returned by the
+ * native scanner.
+ */
+public class NativeScanner implements Scanner {
+
+  private final AtomicBoolean closed = new AtomicBoolean(false);
+  private final AtomicBoolean executed = new AtomicBoolean(false);
+  private final NativeContext context;
+  private final long scannerId;
+
+  public NativeScanner(NativeContext context, long scannerId) {
+this.context = context;
+this.scannerId = scannerId;
+  }
+
+  ScanTask.BatchIterator execute() {
+if (closed.get()) {
+  throw new NativeInstanceClosedException();
+}
+if (!executed.compareAndSet(false, true)) {
+  throw new UnsupportedOperationException("NativeScanner cannot be 
executed more than once. Consider creating " +
+  "new scanner instead");
+}
+return new ScanTask.BatchIterator() {
+  private ArrowRecordBatch peek = null;
+
+  @Override
+  public void close() {
+NativeScanner.this.close();
+  }
+
+  @Override
+  public boolean hasNext() {
+if (closed.get()) {
+  throw new NativeInstanceClosedException();
+}
+if (peek != null) {
+  return true;
+}
+NativeRecordBatchHandle handle = 
JniWrapper.get().nextRecordBatch(scannerId);
+if (handle == null) {
+  return false;
+}
+final ArrayList buffers = new ArrayList<>();
+for (NativeRecordBatchHandle.Buffer buffer : handle.getBuffers()) {
+  final BufferAllocator allocator = context.getAllocator();
+  final NativeUnderlingMemory am = 
NativeUnderlingMemory.create(allocator,
+  (int) buffer.size, buffer.nativeInstanceId, 
buffer.memoryAddress);

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] xhochy commented on pull request #7335: ARROW-9018: [C++] Remove APIs that were marked as deprecated in 0.17.0 and prior

2020-06-15 Thread GitBox


xhochy commented on pull request #7335:
URL: https://github.com/apache/arrow/pull/7335#issuecomment-644053464


   @kou Thanks for the heads-up, made a PR: 
https://github.com/blue-yonder/turbodbc/pull/270



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #7434: ARROW-9131: [C++] Faster ascii_lower and ascii_upper.

2020-06-15 Thread GitBox


pitrou commented on a change in pull request #7434:
URL: https://github.com/apache/arrow/pull/7434#discussion_r440095501



##
File path: cpp/src/arrow/compute/kernels/scalar_string.cc
##
@@ -64,77 +64,30 @@ void StringDataTransform(KernelContext* ctx, const 
ExecBatch& batch,
   }
 }
 
-// Generated with
-//
-// print("static constexpr uint8_t kAsciiUpperTable[] = {")
-// for i in range(256):
-// if i > 0: print(', ', end='')
-// if i >= ord('a') and i <= ord('z'):
-// print(i - 32, end='')
-// else:
-// print(i, end='')
-// print("};")
-
-static constexpr uint8_t kAsciiUpperTable[] = {
-0,   1,   2,   3,   4,   5,   6,   7,   8,   9,   10,  11,  12,  13,  14,  
15,
-16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,  27,  28,  29,  30,  
31,
-32,  33,  34,  35,  36,  37,  38,  39,  40,  41,  42,  43,  44,  45,  46,  
47,
-48,  49,  50,  51,  52,  53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  
63,
-64,  65,  66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,  78,  
79,
-80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,  91,  92,  93,  94,  
95,
-96,  65,  66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,  78,  
79,
-80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,  123, 124, 125, 126, 
127,
-128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 
143,
-144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 
159,
-160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 
175,
-176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 
191,
-192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 
207,
-208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 
223,
-224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 
239,
-240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 
255};
-
 void TransformAsciiUpper(const uint8_t* input, int64_t length, uint8_t* 
output) {
   for (int64_t i = 0; i < length; ++i) {
-*output++ = kAsciiUpperTable[*input++];
+const uint8_t utf8_code_unit = *input++;
+// Code units in the range [a-z] can only be an encoding of an ascii
+// character/codepoint, not the 2nd, 3rd or 4th code unit (byte) of an 
different
+// codepoint. This guaranteed by non-overal design of the unicode 
standard. (see

Review comment:
   "non-overlap"





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] zhztheplayer commented on a change in pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API by JNI to C++

2020-06-15 Thread GitBox


zhztheplayer commented on a change in pull request #7030:
URL: https://github.com/apache/arrow/pull/7030#discussion_r440098238



##
File path: 
java/dataset/src/main/java/org/apache/arrow/dataset/jni/NativeScanner.java
##
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.dataset.jni;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.NoSuchElementException;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.stream.Collectors;
+
+import org.apache.arrow.dataset.scanner.ScanTask;
+import org.apache.arrow.dataset.scanner.Scanner;
+import org.apache.arrow.memory.ArrowBuf;
+import org.apache.arrow.memory.BufferAllocator;
+import org.apache.arrow.memory.BufferLedger;
+import org.apache.arrow.memory.NativeUnderlingMemory;
+import org.apache.arrow.memory.Ownerships;
+import org.apache.arrow.vector.ipc.message.ArrowFieldNode;
+import org.apache.arrow.vector.ipc.message.ArrowRecordBatch;
+import org.apache.arrow.vector.types.pojo.Schema;
+import org.apache.arrow.vector.util.SchemaUtility;
+
+/**
+ * Native implementation of {@link Scanner}. Note that it currently emits only 
a single scan task of type
+ * {@link NativeScanTask}, which is internally a combination of all scan task 
instances returned by the
+ * native scanner.
+ */
+public class NativeScanner implements Scanner {
+
+  private final AtomicBoolean closed = new AtomicBoolean(false);
+  private final AtomicBoolean executed = new AtomicBoolean(false);
+  private final NativeContext context;
+  private final long scannerId;
+
+  public NativeScanner(NativeContext context, long scannerId) {
+this.context = context;
+this.scannerId = scannerId;
+  }
+
+  ScanTask.BatchIterator execute() {
+if (closed.get()) {
+  throw new NativeInstanceClosedException();
+}
+if (!executed.compareAndSet(false, true)) {
+  throw new UnsupportedOperationException("NativeScanner cannot be 
executed more than once. Consider creating " +
+  "new scanner instead");
+}
+return new ScanTask.BatchIterator() {
+  private ArrowRecordBatch peek = null;
+
+  @Override
+  public void close() {
+NativeScanner.this.close();
+  }
+
+  @Override
+  public boolean hasNext() {
+if (closed.get()) {

Review comment:
   Thanks for catching! I've pushed a mutex solution.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] maartenbreddels commented on pull request #7434: ARROW-9131: [C++] Faster ascii_lower and ascii_upper.

2020-06-15 Thread GitBox


maartenbreddels commented on pull request #7434:
URL: https://github.com/apache/arrow/pull/7434#issuecomment-644075652


   I also thought that we could do a bit check instead of the range check, e.g. 
`code_unit & 0b1110) == 0b0110`, but that would also transform the 
backtick for instance (binary value 0b110).
   
   The generated code looks vectorized indeed. I didn't look into the details 
of the generated code by clang and GCC, it seems their performance is a bit 
different, so we might be able to squeeze out a bit more if we want. Happy to 
look into that later (create a new issue), but I rather spend my time on other 
functions now.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on pull request #7417: ARROW-9079: [C++] Write benchmark for arithmetic kernels

2020-06-15 Thread GitBox


kszucs commented on pull request #7417:
URL: https://github.com/apache/arrow/pull/7417#issuecomment-644083982


   Thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on pull request #7434: ARROW-9131: [C++] Faster ascii_lower and ascii_upper.

2020-06-15 Thread GitBox


pitrou commented on pull request #7434:
URL: https://github.com/apache/arrow/pull/7434#issuecomment-644084984


   It's ok, there's no need to further optimize those functions.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #7410: ARROW-971: [C++][Compute] IsValid, IsNull kernels

2020-06-15 Thread GitBox


pitrou commented on a change in pull request #7410:
URL: https://github.com/apache/arrow/pull/7410#discussion_r440130579



##
File path: cpp/src/arrow/testing/random.h
##
@@ -250,6 +250,9 @@ class ARROW_EXPORT RandomArrayGenerator {
int32_t min_length, int32_t 
max_length,
double null_probability = 0);
 
+  std::shared_ptr Of(std::shared_ptr type, int64_t size,

Review comment:
   Call it `Array`?

##
File path: cpp/src/arrow/compute/kernels/test_util.h
##
@@ -97,5 +99,63 @@ using TestingStringTypes =
 
 static constexpr random::SeedType kRandomSeed = 0x0ff1ce;
 
+struct ScalarFunctionPropertyTestParam {

Review comment:
   I'm not sure what the name "property" is supposed to refer to.

##
File path: cpp/src/arrow/compute/kernels/scalar_validity.cc
##
@@ -0,0 +1,97 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "arrow/compute/kernels/common.h"
+
+#include "arrow/util/bit_util.h"
+#include "arrow/util/bitmap_ops.h"
+
+namespace arrow {
+namespace compute {
+
+struct IsValid {
+  static void Call(KernelContext* ctx, const Scalar& in, Scalar* out) {
+checked_cast(out)->value = in.is_valid;
+  }
+
+  static void Call(KernelContext* ctx, const ArrayData& arr, ArrayData* out) {
+if (arr.buffers[0] != nullptr && out->offset == arr.offset &&
+out->length == arr.length) {
+  out->buffers[1] = arr.buffers[0];
+  return;
+}
+
+if (arr.null_count == 0 || arr.buffers[0] == nullptr) {
+  BitUtil::SetBitsTo(out->buffers[1]->mutable_data(), out->offset, 
out->length, true);
+  return;
+}
+
+internal::CopyBitmap(arr.buffers[0]->data(), arr.offset, arr.length,
+ out->buffers[1]->mutable_data(), out->offset);
+  }
+};
+
+struct IsNull {
+  static void Call(KernelContext* ctx, const Scalar& in, Scalar* out) {
+checked_cast(out)->value = !in.is_valid;
+  }
+
+  static void Call(KernelContext* ctx, const ArrayData& arr, ArrayData* out) {
+if (arr.null_count == 0 || arr.buffers[0] == nullptr) {
+  BitUtil::SetBitsTo(out->buffers[1]->mutable_data(), out->offset, 
out->length,
+ false);
+  return;
+}
+
+internal::InvertBitmap(arr.buffers[0]->data(), arr.offset, arr.length,
+   out->buffers[1]->mutable_data(), out->offset);
+  }
+};
+
+namespace codegen {
+
+void MakeFunction(std::string name, std::vector in_types, 
OutputType out_type,
+  ArrayKernelExec exec, FunctionRegistry* registry,
+  NullHandling::type null_handling) {
+  Arity arity{static_cast(in_types.size())};
+  auto func = std::make_shared(name, arity);
+
+  ScalarKernel kernel(std::move(in_types), out_type, exec);
+  kernel.null_handling = null_handling;
+  kernel.can_write_into_slices = true;
+
+  DCHECK_OK(func->AddKernel(std::move(kernel)));
+  DCHECK_OK(registry->AddFunction(std::move(func)));
+}
+
+}  // namespace codegen

Review comment:
   Looks like most of this file should be under the anonymous namespace.

##
File path: cpp/src/arrow/compute/kernels/scalar_validity_test.cc
##
@@ -0,0 +1,151 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include 
+
+#include "arrow/array.h"
+#include "arrow/compute/api.h"
+#include "arrow/compute/kernels/test_util.h"
+#include "arrow/testing/gtest_common.h"

[GitHub] [arrow] pitrou commented on pull request #7434: ARROW-9131: [C++] Faster ascii_lower and ascii_upper.

2020-06-15 Thread GitBox


pitrou commented on pull request #7434:
URL: https://github.com/apache/arrow/pull/7434#issuecomment-644104830


   Travis-CI failure is unrelated.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou closed pull request #7434: ARROW-9131: [C++] Faster ascii_lower and ascii_upper.

2020-06-15 Thread GitBox


pitrou closed pull request #7434:
URL: https://github.com/apache/arrow/pull/7434


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on pull request #7434: ARROW-9131: [C++] Faster ascii_lower and ascii_upper.

2020-06-15 Thread GitBox


pitrou commented on pull request #7434:
URL: https://github.com/apache/arrow/pull/7434#issuecomment-644106486


   Thank you @maartenbreddels !



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] romainfrancois opened a new pull request #7435: ARROW-8779: [R] Implement conversion to List

2020-06-15 Thread GitBox


romainfrancois opened a new pull request #7435:
URL: https://github.com/apache/arrow/pull/7435


   ``` r
   library(arrow)
   #> 
   #> Attaching package: 'arrow'
   #> The following object is masked from 'package:utils':
   #> 
   #> timestamp
   Array$create(list(data.frame(a = 1:3, b = 11:13)))
   #> ListArray
   #> >>
   #> [
   #>   -- is_valid: all not null
   #>   -- child 0 type: int32
   #> [
   #>   1,
   #>   2,
   #>   3
   #> ]
   #>   -- child 1 type: int32
   #> [
   #>   11,
   #>   12,
   #>   13
   #> ]
   #> ]
   ```
   
   Created on 2020-06-15 by the [reprex 
package](https://reprex.tidyverse.org) (v0.3.0)
   
   However does not yet deal with the original issue: 
   
   ``` r
   library(arrow)
   #> 
   #> Attaching package: 'arrow'
   #> The following object is masked from 'package:utils':
   #> 
   #> timestamp
   
   nrows <- 1:3
   df <- tibble::tibble(
   id = 1L,
   data = 
 list(
   tibble::tibble(
 a = letters[nrows],
 b = as.integer(nrows),
 c = as.factor(a),
 d = b/2
   )
 )
 )
   Table$create(df)
   #> Error in Table__from_dots(dots, schema): NotImplemented: Converting 
vector to arrow type dictionary not 
implemented
   ```
   
   Created on 2020-06-15 by the [reprex 
package](https://reprex.tidyverse.org) (v0.3.0)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7435: ARROW-8779: [R] Implement conversion to List

2020-06-15 Thread GitBox


github-actions[bot] commented on pull request #7435:
URL: https://github.com/apache/arrow/pull/7435#issuecomment-644128787


   https://issues.apache.org/jira/browse/ARROW-8779



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] bkietz commented on a change in pull request #7410: ARROW-971: [C++][Compute] IsValid, IsNull kernels

2020-06-15 Thread GitBox


bkietz commented on a change in pull request #7410:
URL: https://github.com/apache/arrow/pull/7410#discussion_r440182319



##
File path: cpp/src/arrow/compute/kernels/scalar_validity.cc
##
@@ -0,0 +1,97 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "arrow/compute/kernels/common.h"
+
+#include "arrow/util/bit_util.h"
+#include "arrow/util/bitmap_ops.h"
+
+namespace arrow {
+namespace compute {
+
+struct IsValid {
+  static void Call(KernelContext* ctx, const Scalar& in, Scalar* out) {
+checked_cast(out)->value = in.is_valid;
+  }
+
+  static void Call(KernelContext* ctx, const ArrayData& arr, ArrayData* out) {
+if (arr.buffers[0] != nullptr && out->offset == arr.offset &&
+out->length == arr.length) {
+  out->buffers[1] = arr.buffers[0];
+  return;
+}
+
+if (arr.null_count == 0 || arr.buffers[0] == nullptr) {
+  BitUtil::SetBitsTo(out->buffers[1]->mutable_data(), out->offset, 
out->length, true);
+  return;
+}
+
+internal::CopyBitmap(arr.buffers[0]->data(), arr.offset, arr.length,
+ out->buffers[1]->mutable_data(), out->offset);
+  }
+};
+
+struct IsNull {
+  static void Call(KernelContext* ctx, const Scalar& in, Scalar* out) {
+checked_cast(out)->value = !in.is_valid;
+  }
+
+  static void Call(KernelContext* ctx, const ArrayData& arr, ArrayData* out) {
+if (arr.null_count == 0 || arr.buffers[0] == nullptr) {
+  BitUtil::SetBitsTo(out->buffers[1]->mutable_data(), out->offset, 
out->length,
+ false);
+  return;
+}
+
+internal::InvertBitmap(arr.buffers[0]->data(), arr.offset, arr.length,
+   out->buffers[1]->mutable_data(), out->offset);
+  }
+};
+
+namespace codegen {
+
+void MakeFunction(std::string name, std::vector in_types, 
OutputType out_type,
+  ArrayKernelExec exec, FunctionRegistry* registry,
+  NullHandling::type null_handling) {
+  Arity arity{static_cast(in_types.size())};
+  auto func = std::make_shared(name, arity);
+
+  ScalarKernel kernel(std::move(in_types), out_type, exec);
+  kernel.null_handling = null_handling;
+  kernel.can_write_into_slices = true;
+
+  DCHECK_OK(func->AddKernel(std::move(kernel)));
+  DCHECK_OK(registry->AddFunction(std::move(func)));
+}
+
+}  // namespace codegen

Review comment:
   okay





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] maartenbreddels commented on pull request #7434: ARROW-9131: [C++] Faster ascii_lower and ascii_upper.

2020-06-15 Thread GitBox


maartenbreddels commented on pull request #7434:
URL: https://github.com/apache/arrow/pull/7434#issuecomment-644142377


   Thanks, let me know if my workflow is ok, or if I can make things go 
smoother.
   
   PS: I am looking for a document describing the kernel design. I see these 
two cases (`if (batch[0].kind() == Datum::ARRAY) {` and the else clause, but I 
am not sure I fully understand this. But I'm not sure where this is described, 
if it is.
   
   PS2: Are there alternative channels for quick/small questions, or is this 
fine?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] bkietz commented on a change in pull request #7410: ARROW-971: [C++][Compute] IsValid, IsNull kernels

2020-06-15 Thread GitBox


bkietz commented on a change in pull request #7410:
URL: https://github.com/apache/arrow/pull/7410#discussion_r440185720



##
File path: cpp/src/arrow/compute/kernels/scalar_validity_test.cc
##
@@ -0,0 +1,151 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include 
+
+#include "arrow/array.h"
+#include "arrow/compute/api.h"
+#include "arrow/compute/kernels/test_util.h"
+#include "arrow/testing/gtest_common.h"
+#include "arrow/testing/gtest_util.h"
+#include "arrow/testing/random.h"
+#include "arrow/type.h"
+#include "arrow/type_traits.h"
+#include "arrow/util/bitmap_reader.h"
+#include "arrow/util/checked_cast.h"
+
+namespace arrow {
+namespace compute {
+
+class TestValidityKernels : public ::testing::Test {
+ protected:
+  // XXX Since IsValid and IsNull don't touch any buffers but the null bitmap
+  // testing multiple types seems redundant.
+  using ArrowType = BooleanType;
+
+  using CType = typename ArrowType::c_type;
+
+  static std::shared_ptr type_singleton() {
+return TypeTraits::type_singleton();
+  }
+
+  void AssertUnary(Datum arg, Datum expected) {
+ASSERT_OK_AND_ASSIGN(auto actual, func_(arg, nullptr));
+ASSERT_EQ(actual.kind(), expected.kind());
+if (actual.kind() == Datum::ARRAY) {
+  ASSERT_OK(actual.make_array()->ValidateFull());
+  AssertArraysApproxEqual(*expected.make_array(), *actual.make_array());
+} else {
+  AssertScalarsEqual(*expected.scalar(), *actual.scalar());
+}
+  }
+
+  void AssertUnary(const std::string& arg_json, const std::string& 
expected_json) {
+AssertUnary(ArrayFromJSON(type_singleton(), arg_json),
+ArrayFromJSON(type_singleton(), expected_json));
+  }
+
+  using UnaryFunction = std::function(const Datum&, 
ExecContext*)>;
+  UnaryFunction func_;

Review comment:
   It's a bit less boilerplate but probably not enough to justify the 
different function. I'll remove it





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on pull request #7434: ARROW-9131: [C++] Faster ascii_lower and ascii_upper.

2020-06-15 Thread GitBox


pitrou commented on pull request #7434:
URL: https://github.com/apache/arrow/pull/7434#issuecomment-644144990


   I think the current approach (implement kernels one-by-one) is reasonable 
and manageable for us (and for you as well I hope).
   
   I don't think there's much documentation for now around the kernel design. 
Basically, kernels should usually be able to process two kinds of inputs 
(represented as `Datum`s): arrays and scalars. That said, arrays are the 
dominant case, so if we leave scalars unimplemented in a given kernel, that's 
not an urgent problem.
   
   For quick development questions, we have a public chat instance at 
https://ursalabs.zulipchat.com/ - just register though and you can chat with 
the team. The main channel there is the "dev" channel, and Zulip allows you to 
create subtopics - don't hesitate to use those!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] maartenbreddels commented on pull request #7434: ARROW-9131: [C++] Faster ascii_lower and ascii_upper.

2020-06-15 Thread GitBox


maartenbreddels commented on pull request #7434:
URL: https://github.com/apache/arrow/pull/7434#issuecomment-644146353


   > I think the current approach (implement kernels one-by-one) is reasonable 
and manageable for us (and for you as well I hope).
   
   No, this is fine.
   
   
   
   > That said, arrays are the dominant case, so if we leave scalars 
unimplemented in a given kernel, that's not an urgent problem.
   
   👍 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] romainfrancois commented on pull request #7435: ARROW-8779: [R] Implement conversion to List

2020-06-15 Thread GitBox


romainfrancois commented on pull request #7435:
URL: https://github.com/apache/arrow/pull/7435#issuecomment-644147517


   @github-actions autotune everything



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7434: ARROW-9131: [C++] Faster ascii_lower and ascii_upper.

2020-06-15 Thread GitBox


wesm commented on pull request #7434:
URL: https://github.com/apache/arrow/pull/7434#issuecomment-644149831


   > PS2: Are there alternative channels for quick/small questions, or is this 
fine?
   
   Can we use d...@arrow.apache.org so everyone can see the discussion and it's 
searchable via Google later?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm edited a comment on pull request #7434: ARROW-9131: [C++] Faster ascii_lower and ascii_upper.

2020-06-15 Thread GitBox


wesm edited a comment on pull request #7434:
URL: https://github.com/apache/arrow/pull/7434#issuecomment-644149831


   > PS2: Are there alternative channels for quick/small questions, or is this 
fine?
   
   Can we use d...@arrow.apache.org so everyone can see the discussion and it's 
searchable via Google later?
   
   > PS: I am looking for a document describing the kernel design.
   
   Keep in mind that the new kernels framework is only a few weeks old 
(https://github.com/apache/arrow/commit/7ad49eeca5215d9b2a56b6439f1bd6ea3ea9),
 so developer documentation is a bit behind, so don't hesitate to ask questions.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] maartenbreddels commented on pull request #7434: ARROW-9131: [C++] Faster ascii_lower and ascii_upper.

2020-06-15 Thread GitBox


maartenbreddels commented on pull request #7434:
URL: https://github.com/apache/arrow/pull/7434#issuecomment-644153016


   > Can we use [d...@arrow.apache.org](mailto:d...@arrow.apache.org) so 
everyone can see the discussion and it's searchable via Google later?
   
   I'm ok with that if not considered too noisy (like small cmake questions 
etc).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] bkietz commented on a change in pull request #7410: ARROW-971: [C++][Compute] IsValid, IsNull kernels

2020-06-15 Thread GitBox


bkietz commented on a change in pull request #7410:
URL: https://github.com/apache/arrow/pull/7410#discussion_r440197092



##
File path: cpp/src/arrow/testing/random.h
##
@@ -250,6 +250,9 @@ class ARROW_EXPORT RandomArrayGenerator {
int32_t min_length, int32_t 
max_length,
double null_probability = 0);
 
+  std::shared_ptr Of(std::shared_ptr type, int64_t size,

Review comment:
   will do





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] bkietz commented on a change in pull request #7410: ARROW-971: [C++][Compute] IsValid, IsNull kernels

2020-06-15 Thread GitBox


bkietz commented on a change in pull request #7410:
URL: https://github.com/apache/arrow/pull/7410#discussion_r440202771



##
File path: cpp/src/arrow/compute/kernels/scalar_validity.cc
##
@@ -0,0 +1,97 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "arrow/compute/kernels/common.h"
+
+#include "arrow/util/bit_util.h"
+#include "arrow/util/bitmap_ops.h"
+
+namespace arrow {
+namespace compute {
+
+struct IsValid {
+  static void Call(KernelContext* ctx, const Scalar& in, Scalar* out) {
+checked_cast(out)->value = in.is_valid;
+  }
+
+  static void Call(KernelContext* ctx, const ArrayData& arr, ArrayData* out) {
+if (arr.buffers[0] != nullptr && out->offset == arr.offset &&
+out->length == arr.length) {
+  out->buffers[1] = arr.buffers[0];
+  return;
+}

Review comment:
   My thought was that this optimization won't be available frequently 
enough to sacrifice the benefits of preallocated memory when zero copy isn't 
possible The intent here was to allow immediate reclamation of memory when we 
can determine we don't need it. I'll refactor to assume zero copy is usually 
possible





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7434: ARROW-9131: [C++] Faster ascii_lower and ascii_upper.

2020-06-15 Thread GitBox


wesm commented on pull request #7434:
URL: https://github.com/apache/arrow/pull/7434#issuecomment-644158423


   It shouldn't be a problem.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] bkietz commented on a change in pull request #7410: ARROW-971: [C++][Compute] IsValid, IsNull kernels

2020-06-15 Thread GitBox


bkietz commented on a change in pull request #7410:
URL: https://github.com/apache/arrow/pull/7410#discussion_r440205745



##
File path: cpp/src/arrow/compute/kernels/scalar_validity_test.cc
##
@@ -0,0 +1,151 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include 
+
+#include "arrow/array.h"
+#include "arrow/compute/api.h"
+#include "arrow/compute/kernels/test_util.h"
+#include "arrow/testing/gtest_common.h"
+#include "arrow/testing/gtest_util.h"
+#include "arrow/testing/random.h"
+#include "arrow/type.h"
+#include "arrow/type_traits.h"
+#include "arrow/util/bitmap_reader.h"
+#include "arrow/util/checked_cast.h"
+
+namespace arrow {
+namespace compute {
+
+class TestValidityKernels : public ::testing::Test {
+ protected:
+  // XXX Since IsValid and IsNull don't touch any buffers but the null bitmap
+  // testing multiple types seems redundant.
+  using ArrowType = BooleanType;
+
+  using CType = typename ArrowType::c_type;
+
+  static std::shared_ptr type_singleton() {
+return TypeTraits::type_singleton();
+  }
+
+  void AssertUnary(Datum arg, Datum expected) {
+ASSERT_OK_AND_ASSIGN(auto actual, func_(arg, nullptr));
+ASSERT_EQ(actual.kind(), expected.kind());
+if (actual.kind() == Datum::ARRAY) {
+  ASSERT_OK(actual.make_array()->ValidateFull());
+  AssertArraysApproxEqual(*expected.make_array(), *actual.make_array());
+} else {
+  AssertScalarsEqual(*expected.scalar(), *actual.scalar());
+}
+  }
+
+  void AssertUnary(const std::string& arg_json, const std::string& 
expected_json) {
+AssertUnary(ArrayFromJSON(type_singleton(), arg_json),
+ArrayFromJSON(type_singleton(), expected_json));
+  }
+
+  using UnaryFunction = std::function(const Datum&, 
ExecContext*)>;
+  UnaryFunction func_;
+};
+
+TEST_F(TestValidityKernels, ArrayIsValid) {
+  func_ = arrow::compute::IsValid;
+
+  this->AssertUnary("[]", "[]");
+  this->AssertUnary("[null]", "[false]");
+  this->AssertUnary("[1]", "[true]");
+  this->AssertUnary("[null, 1, 0, null]", "[false, true, true, false]");
+}
+
+TEST_F(TestValidityKernels, ArrayIsValidBufferPassthruOptimization) {
+  Datum arg = ArrayFromJSON(boolean(), "[null, 1, 0, null]");
+  ASSERT_OK_AND_ASSIGN(auto validity, arrow::compute::IsValid(arg));
+  ASSERT_EQ(validity.array()->buffers[1], arg.array()->buffers[0]);
+}
+
+TEST_F(TestValidityKernels, ScalarIsValid) {
+  func_ = arrow::compute::IsValid;
+
+  AssertUnary(Datum(19.7), Datum(true));
+  AssertUnary(MakeNullScalar(float64()), Datum(false));
+}
+
+TEST_F(TestValidityKernels, ArrayIsNull) {
+  func_ = arrow::compute::IsNull;
+
+  this->AssertUnary("[]", "[]");
+  this->AssertUnary("[null]", "[true]");
+  this->AssertUnary("[1]", "[false]");
+  this->AssertUnary("[null, 1, 0, null]", "[true, false, false, true]");
+}
+
+TEST_F(TestValidityKernels, DISABLED_ScalarIsNull) {
+  func_ = arrow::compute::IsNull;
+
+  AssertUnary(Datum(19.7), Datum(false));
+  AssertUnary(MakeNullScalar(float64()), Datum(true));
+}
+
+class IsValidProperty : public ScalarFunctionPropertyMixin {
+ public:
+  std::shared_ptr GetFunction() override {
+return internal::checked_pointer_cast(
+*GetFunctionRegistry()->GetFunction("is_valid"));
+  }
+
+  Result> Contract(const ScalarVector& args,
+   const FunctionOptions*) override {
+return std::make_shared(args[0]->is_valid);
+  }
+};
+
+TEST_P(IsValidProperty, TestIsValidProperty) { Validate(); }
+
+INSTANTIATE_TEST_SUITE_P(IsValidPropertyTest, IsValidProperty,
+ ScalarFunctionPropertyTestParam::Values({
+ {0, 0.0},
+ {1, 0.0},
+ {2, 0.0},
+ {1024, 0.25},
+ }));
+
+class IsNullProperty : public ScalarFunctionPropertyMixin {
+ public:
+  std::shared_ptr GetFunction() override {
+return internal::checked_pointer_cast(
+*GetFunctionRegistry()->GetFunction("is_null"));
+  }
+
+  Result> Contract(const ScalarVector& args,
+   const FunctionOptions*) override {
+return std::make_shared(!args[0]->is_valid);
+  }
+}

[GitHub] [arrow] bkietz commented on a change in pull request #7410: ARROW-971: [C++][Compute] IsValid, IsNull kernels

2020-06-15 Thread GitBox


bkietz commented on a change in pull request #7410:
URL: https://github.com/apache/arrow/pull/7410#discussion_r440206424



##
File path: cpp/src/arrow/compute/kernels/test_util.h
##
@@ -97,5 +99,63 @@ using TestingStringTypes =
 
 static constexpr random::SeedType kRandomSeed = 0x0ff1ce;
 
+struct ScalarFunctionPropertyTestParam {

Review comment:
   https://en.wikipedia.org/wiki/Property_testing





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] bkietz commented on a change in pull request #7410: ARROW-971: [C++][Compute] IsValid, IsNull kernels

2020-06-15 Thread GitBox


bkietz commented on a change in pull request #7410:
URL: https://github.com/apache/arrow/pull/7410#discussion_r440220181



##
File path: cpp/src/arrow/compute/kernels/scalar_validity_test.cc
##
@@ -0,0 +1,151 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include 
+
+#include "arrow/array.h"
+#include "arrow/compute/api.h"
+#include "arrow/compute/kernels/test_util.h"
+#include "arrow/testing/gtest_common.h"
+#include "arrow/testing/gtest_util.h"
+#include "arrow/testing/random.h"
+#include "arrow/type.h"
+#include "arrow/type_traits.h"
+#include "arrow/util/bitmap_reader.h"
+#include "arrow/util/checked_cast.h"
+
+namespace arrow {
+namespace compute {
+
+class TestValidityKernels : public ::testing::Test {
+ protected:
+  // XXX Since IsValid and IsNull don't touch any buffers but the null bitmap
+  // testing multiple types seems redundant.
+  using ArrowType = BooleanType;
+
+  using CType = typename ArrowType::c_type;
+
+  static std::shared_ptr type_singleton() {
+return TypeTraits::type_singleton();
+  }
+
+  void AssertUnary(Datum arg, Datum expected) {
+ASSERT_OK_AND_ASSIGN(auto actual, func_(arg, nullptr));
+ASSERT_EQ(actual.kind(), expected.kind());
+if (actual.kind() == Datum::ARRAY) {
+  ASSERT_OK(actual.make_array()->ValidateFull());
+  AssertArraysApproxEqual(*expected.make_array(), *actual.make_array());
+} else {
+  AssertScalarsEqual(*expected.scalar(), *actual.scalar());
+}
+  }
+
+  void AssertUnary(const std::string& arg_json, const std::string& 
expected_json) {
+AssertUnary(ArrayFromJSON(type_singleton(), arg_json),
+ArrayFromJSON(type_singleton(), expected_json));
+  }
+
+  using UnaryFunction = std::function(const Datum&, 
ExecContext*)>;
+  UnaryFunction func_;
+};
+
+TEST_F(TestValidityKernels, ArrayIsValid) {
+  func_ = arrow::compute::IsValid;
+
+  this->AssertUnary("[]", "[]");
+  this->AssertUnary("[null]", "[false]");
+  this->AssertUnary("[1]", "[true]");
+  this->AssertUnary("[null, 1, 0, null]", "[false, true, true, false]");
+}
+
+TEST_F(TestValidityKernels, ArrayIsValidBufferPassthruOptimization) {
+  Datum arg = ArrayFromJSON(boolean(), "[null, 1, 0, null]");
+  ASSERT_OK_AND_ASSIGN(auto validity, arrow::compute::IsValid(arg));
+  ASSERT_EQ(validity.array()->buffers[1], arg.array()->buffers[0]);
+}
+
+TEST_F(TestValidityKernels, ScalarIsValid) {
+  func_ = arrow::compute::IsValid;
+
+  AssertUnary(Datum(19.7), Datum(true));
+  AssertUnary(MakeNullScalar(float64()), Datum(false));
+}
+
+TEST_F(TestValidityKernels, ArrayIsNull) {
+  func_ = arrow::compute::IsNull;
+
+  this->AssertUnary("[]", "[]");
+  this->AssertUnary("[null]", "[true]");
+  this->AssertUnary("[1]", "[false]");
+  this->AssertUnary("[null, 1, 0, null]", "[true, false, false, true]");
+}
+
+TEST_F(TestValidityKernels, DISABLED_ScalarIsNull) {
+  func_ = arrow::compute::IsNull;
+
+  AssertUnary(Datum(19.7), Datum(false));
+  AssertUnary(MakeNullScalar(float64()), Datum(true));
+}
+
+class IsValidProperty : public ScalarFunctionPropertyMixin {
+ public:
+  std::shared_ptr GetFunction() override {
+return internal::checked_pointer_cast(
+*GetFunctionRegistry()->GetFunction("is_valid"));
+  }
+
+  Result> Contract(const ScalarVector& args,
+   const FunctionOptions*) override {
+return std::make_shared(args[0]->is_valid);
+  }
+};
+
+TEST_P(IsValidProperty, TestIsValidProperty) { Validate(); }
+
+INSTANTIATE_TEST_SUITE_P(IsValidPropertyTest, IsValidProperty,
+ ScalarFunctionPropertyTestParam::Values({
+ {0, 0.0},
+ {1, 0.0},
+ {2, 0.0},
+ {1024, 0.25},
+ }));
+
+class IsNullProperty : public ScalarFunctionPropertyMixin {
+ public:
+  std::shared_ptr GetFunction() override {
+return internal::checked_pointer_cast(
+*GetFunctionRegistry()->GetFunction("is_null"));
+  }
+
+  Result> Contract(const ScalarVector& args,
+   const FunctionOptions*) override {
+return std::make_shared(!args[0]->is_valid);
+  }
+}

[GitHub] [arrow] wesm commented on a change in pull request #7410: ARROW-971: [C++][Compute] IsValid, IsNull kernels

2020-06-15 Thread GitBox


wesm commented on a change in pull request #7410:
URL: https://github.com/apache/arrow/pull/7410#discussion_r440225706



##
File path: cpp/src/arrow/compute/kernels/scalar_validity_test.cc
##
@@ -0,0 +1,151 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include 
+
+#include "arrow/array.h"
+#include "arrow/compute/api.h"
+#include "arrow/compute/kernels/test_util.h"
+#include "arrow/testing/gtest_common.h"
+#include "arrow/testing/gtest_util.h"
+#include "arrow/testing/random.h"
+#include "arrow/type.h"
+#include "arrow/type_traits.h"
+#include "arrow/util/bitmap_reader.h"
+#include "arrow/util/checked_cast.h"
+
+namespace arrow {
+namespace compute {
+
+class TestValidityKernels : public ::testing::Test {
+ protected:
+  // XXX Since IsValid and IsNull don't touch any buffers but the null bitmap
+  // testing multiple types seems redundant.
+  using ArrowType = BooleanType;
+
+  using CType = typename ArrowType::c_type;
+
+  static std::shared_ptr type_singleton() {
+return TypeTraits::type_singleton();
+  }
+
+  void AssertUnary(Datum arg, Datum expected) {
+ASSERT_OK_AND_ASSIGN(auto actual, func_(arg, nullptr));
+ASSERT_EQ(actual.kind(), expected.kind());
+if (actual.kind() == Datum::ARRAY) {
+  ASSERT_OK(actual.make_array()->ValidateFull());
+  AssertArraysApproxEqual(*expected.make_array(), *actual.make_array());
+} else {
+  AssertScalarsEqual(*expected.scalar(), *actual.scalar());
+}
+  }
+
+  void AssertUnary(const std::string& arg_json, const std::string& 
expected_json) {
+AssertUnary(ArrayFromJSON(type_singleton(), arg_json),
+ArrayFromJSON(type_singleton(), expected_json));
+  }
+
+  using UnaryFunction = std::function(const Datum&, 
ExecContext*)>;
+  UnaryFunction func_;
+};
+
+TEST_F(TestValidityKernels, ArrayIsValid) {
+  func_ = arrow::compute::IsValid;
+
+  this->AssertUnary("[]", "[]");
+  this->AssertUnary("[null]", "[false]");
+  this->AssertUnary("[1]", "[true]");
+  this->AssertUnary("[null, 1, 0, null]", "[false, true, true, false]");
+}
+
+TEST_F(TestValidityKernels, ArrayIsValidBufferPassthruOptimization) {
+  Datum arg = ArrayFromJSON(boolean(), "[null, 1, 0, null]");
+  ASSERT_OK_AND_ASSIGN(auto validity, arrow::compute::IsValid(arg));
+  ASSERT_EQ(validity.array()->buffers[1], arg.array()->buffers[0]);
+}
+
+TEST_F(TestValidityKernels, ScalarIsValid) {
+  func_ = arrow::compute::IsValid;
+
+  AssertUnary(Datum(19.7), Datum(true));
+  AssertUnary(MakeNullScalar(float64()), Datum(false));
+}
+
+TEST_F(TestValidityKernels, ArrayIsNull) {
+  func_ = arrow::compute::IsNull;
+
+  this->AssertUnary("[]", "[]");
+  this->AssertUnary("[null]", "[true]");
+  this->AssertUnary("[1]", "[false]");
+  this->AssertUnary("[null, 1, 0, null]", "[true, false, false, true]");
+}
+
+TEST_F(TestValidityKernels, DISABLED_ScalarIsNull) {
+  func_ = arrow::compute::IsNull;
+
+  AssertUnary(Datum(19.7), Datum(false));
+  AssertUnary(MakeNullScalar(float64()), Datum(true));
+}
+
+class IsValidProperty : public ScalarFunctionPropertyMixin {
+ public:
+  std::shared_ptr GetFunction() override {
+return internal::checked_pointer_cast(
+*GetFunctionRegistry()->GetFunction("is_valid"));
+  }
+
+  Result> Contract(const ScalarVector& args,
+   const FunctionOptions*) override {
+return std::make_shared(args[0]->is_valid);
+  }
+};
+
+TEST_P(IsValidProperty, TestIsValidProperty) { Validate(); }
+
+INSTANTIATE_TEST_SUITE_P(IsValidPropertyTest, IsValidProperty,
+ ScalarFunctionPropertyTestParam::Values({
+ {0, 0.0},
+ {1, 0.0},
+ {2, 0.0},
+ {1024, 0.25},
+ }));
+
+class IsNullProperty : public ScalarFunctionPropertyMixin {
+ public:
+  std::shared_ptr GetFunction() override {
+return internal::checked_pointer_cast(
+*GetFunctionRegistry()->GetFunction("is_null"));
+  }
+
+  Result> Contract(const ScalarVector& args,
+   const FunctionOptions*) override {
+return std::make_shared(!args[0]->is_valid);
+  }
+};

[GitHub] [arrow] bkietz commented on a change in pull request #7410: ARROW-971: [C++][Compute] IsValid, IsNull kernels

2020-06-15 Thread GitBox


bkietz commented on a change in pull request #7410:
URL: https://github.com/apache/arrow/pull/7410#discussion_r440231899



##
File path: cpp/src/arrow/compute/kernels/scalar_validity_test.cc
##
@@ -0,0 +1,151 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include 
+
+#include "arrow/array.h"
+#include "arrow/compute/api.h"
+#include "arrow/compute/kernels/test_util.h"
+#include "arrow/testing/gtest_common.h"
+#include "arrow/testing/gtest_util.h"
+#include "arrow/testing/random.h"
+#include "arrow/type.h"
+#include "arrow/type_traits.h"
+#include "arrow/util/bitmap_reader.h"
+#include "arrow/util/checked_cast.h"
+
+namespace arrow {
+namespace compute {
+
+class TestValidityKernels : public ::testing::Test {
+ protected:
+  // XXX Since IsValid and IsNull don't touch any buffers but the null bitmap
+  // testing multiple types seems redundant.
+  using ArrowType = BooleanType;
+
+  using CType = typename ArrowType::c_type;
+
+  static std::shared_ptr type_singleton() {
+return TypeTraits::type_singleton();
+  }
+
+  void AssertUnary(Datum arg, Datum expected) {
+ASSERT_OK_AND_ASSIGN(auto actual, func_(arg, nullptr));
+ASSERT_EQ(actual.kind(), expected.kind());
+if (actual.kind() == Datum::ARRAY) {
+  ASSERT_OK(actual.make_array()->ValidateFull());
+  AssertArraysApproxEqual(*expected.make_array(), *actual.make_array());
+} else {
+  AssertScalarsEqual(*expected.scalar(), *actual.scalar());
+}
+  }
+
+  void AssertUnary(const std::string& arg_json, const std::string& 
expected_json) {
+AssertUnary(ArrayFromJSON(type_singleton(), arg_json),
+ArrayFromJSON(type_singleton(), expected_json));
+  }
+
+  using UnaryFunction = std::function(const Datum&, 
ExecContext*)>;
+  UnaryFunction func_;
+};
+
+TEST_F(TestValidityKernels, ArrayIsValid) {
+  func_ = arrow::compute::IsValid;
+
+  this->AssertUnary("[]", "[]");
+  this->AssertUnary("[null]", "[false]");
+  this->AssertUnary("[1]", "[true]");
+  this->AssertUnary("[null, 1, 0, null]", "[false, true, true, false]");
+}
+
+TEST_F(TestValidityKernels, ArrayIsValidBufferPassthruOptimization) {
+  Datum arg = ArrayFromJSON(boolean(), "[null, 1, 0, null]");
+  ASSERT_OK_AND_ASSIGN(auto validity, arrow::compute::IsValid(arg));
+  ASSERT_EQ(validity.array()->buffers[1], arg.array()->buffers[0]);
+}
+
+TEST_F(TestValidityKernels, ScalarIsValid) {
+  func_ = arrow::compute::IsValid;
+
+  AssertUnary(Datum(19.7), Datum(true));
+  AssertUnary(MakeNullScalar(float64()), Datum(false));
+}
+
+TEST_F(TestValidityKernels, ArrayIsNull) {
+  func_ = arrow::compute::IsNull;
+
+  this->AssertUnary("[]", "[]");
+  this->AssertUnary("[null]", "[true]");
+  this->AssertUnary("[1]", "[false]");
+  this->AssertUnary("[null, 1, 0, null]", "[true, false, false, true]");
+}
+
+TEST_F(TestValidityKernels, DISABLED_ScalarIsNull) {
+  func_ = arrow::compute::IsNull;
+
+  AssertUnary(Datum(19.7), Datum(false));
+  AssertUnary(MakeNullScalar(float64()), Datum(true));
+}
+
+class IsValidProperty : public ScalarFunctionPropertyMixin {
+ public:
+  std::shared_ptr GetFunction() override {
+return internal::checked_pointer_cast(
+*GetFunctionRegistry()->GetFunction("is_valid"));
+  }
+
+  Result> Contract(const ScalarVector& args,
+   const FunctionOptions*) override {
+return std::make_shared(args[0]->is_valid);
+  }
+};
+
+TEST_P(IsValidProperty, TestIsValidProperty) { Validate(); }
+
+INSTANTIATE_TEST_SUITE_P(IsValidPropertyTest, IsValidProperty,
+ ScalarFunctionPropertyTestParam::Values({
+ {0, 0.0},
+ {1, 0.0},
+ {2, 0.0},
+ {1024, 0.25},
+ }));
+
+class IsNullProperty : public ScalarFunctionPropertyMixin {
+ public:
+  std::shared_ptr GetFunction() override {
+return internal::checked_pointer_cast(
+*GetFunctionRegistry()->GetFunction("is_null"));
+  }
+
+  Result> Contract(const ScalarVector& args,
+   const FunctionOptions*) override {
+return std::make_shared(!args[0]->is_valid);
+  }
+}

[GitHub] [arrow] bkietz commented on a change in pull request #7410: ARROW-971: [C++][Compute] IsValid, IsNull kernels

2020-06-15 Thread GitBox


bkietz commented on a change in pull request #7410:
URL: https://github.com/apache/arrow/pull/7410#discussion_r440239175



##
File path: cpp/src/arrow/compute/kernels/scalar_validity_test.cc
##
@@ -0,0 +1,151 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include 
+
+#include "arrow/array.h"
+#include "arrow/compute/api.h"
+#include "arrow/compute/kernels/test_util.h"
+#include "arrow/testing/gtest_common.h"
+#include "arrow/testing/gtest_util.h"
+#include "arrow/testing/random.h"
+#include "arrow/type.h"
+#include "arrow/type_traits.h"
+#include "arrow/util/bitmap_reader.h"
+#include "arrow/util/checked_cast.h"
+
+namespace arrow {
+namespace compute {
+
+class TestValidityKernels : public ::testing::Test {
+ protected:
+  // XXX Since IsValid and IsNull don't touch any buffers but the null bitmap
+  // testing multiple types seems redundant.
+  using ArrowType = BooleanType;
+
+  using CType = typename ArrowType::c_type;
+
+  static std::shared_ptr type_singleton() {
+return TypeTraits::type_singleton();
+  }
+
+  void AssertUnary(Datum arg, Datum expected) {
+ASSERT_OK_AND_ASSIGN(auto actual, func_(arg, nullptr));
+ASSERT_EQ(actual.kind(), expected.kind());
+if (actual.kind() == Datum::ARRAY) {
+  ASSERT_OK(actual.make_array()->ValidateFull());
+  AssertArraysApproxEqual(*expected.make_array(), *actual.make_array());
+} else {
+  AssertScalarsEqual(*expected.scalar(), *actual.scalar());
+}
+  }
+
+  void AssertUnary(const std::string& arg_json, const std::string& 
expected_json) {
+AssertUnary(ArrayFromJSON(type_singleton(), arg_json),
+ArrayFromJSON(type_singleton(), expected_json));
+  }
+
+  using UnaryFunction = std::function(const Datum&, 
ExecContext*)>;
+  UnaryFunction func_;
+};
+
+TEST_F(TestValidityKernels, ArrayIsValid) {
+  func_ = arrow::compute::IsValid;
+
+  this->AssertUnary("[]", "[]");
+  this->AssertUnary("[null]", "[false]");
+  this->AssertUnary("[1]", "[true]");
+  this->AssertUnary("[null, 1, 0, null]", "[false, true, true, false]");
+}
+
+TEST_F(TestValidityKernels, ArrayIsValidBufferPassthruOptimization) {
+  Datum arg = ArrayFromJSON(boolean(), "[null, 1, 0, null]");
+  ASSERT_OK_AND_ASSIGN(auto validity, arrow::compute::IsValid(arg));
+  ASSERT_EQ(validity.array()->buffers[1], arg.array()->buffers[0]);
+}
+
+TEST_F(TestValidityKernels, ScalarIsValid) {
+  func_ = arrow::compute::IsValid;
+
+  AssertUnary(Datum(19.7), Datum(true));
+  AssertUnary(MakeNullScalar(float64()), Datum(false));
+}
+
+TEST_F(TestValidityKernels, ArrayIsNull) {
+  func_ = arrow::compute::IsNull;
+
+  this->AssertUnary("[]", "[]");
+  this->AssertUnary("[null]", "[true]");
+  this->AssertUnary("[1]", "[false]");
+  this->AssertUnary("[null, 1, 0, null]", "[true, false, false, true]");
+}
+
+TEST_F(TestValidityKernels, DISABLED_ScalarIsNull) {
+  func_ = arrow::compute::IsNull;
+
+  AssertUnary(Datum(19.7), Datum(false));
+  AssertUnary(MakeNullScalar(float64()), Datum(true));
+}
+
+class IsValidProperty : public ScalarFunctionPropertyMixin {
+ public:
+  std::shared_ptr GetFunction() override {
+return internal::checked_pointer_cast(
+*GetFunctionRegistry()->GetFunction("is_valid"));
+  }
+
+  Result> Contract(const ScalarVector& args,
+   const FunctionOptions*) override {
+return std::make_shared(args[0]->is_valid);
+  }
+};
+
+TEST_P(IsValidProperty, TestIsValidProperty) { Validate(); }
+
+INSTANTIATE_TEST_SUITE_P(IsValidPropertyTest, IsValidProperty,
+ ScalarFunctionPropertyTestParam::Values({
+ {0, 0.0},
+ {1, 0.0},
+ {2, 0.0},
+ {1024, 0.25},
+ }));
+
+class IsNullProperty : public ScalarFunctionPropertyMixin {
+ public:
+  std::shared_ptr GetFunction() override {
+return internal::checked_pointer_cast(
+*GetFunctionRegistry()->GetFunction("is_null"));
+  }
+
+  Result> Contract(const ScalarVector& args,
+   const FunctionOptions*) override {
+return std::make_shared(!args[0]->is_valid);
+  }
+}

[GitHub] [arrow] wesm commented on a change in pull request #7410: ARROW-971: [C++][Compute] IsValid, IsNull kernels

2020-06-15 Thread GitBox


wesm commented on a change in pull request #7410:
URL: https://github.com/apache/arrow/pull/7410#discussion_r44024



##
File path: cpp/src/arrow/compute/kernels/scalar_validity_test.cc
##
@@ -0,0 +1,151 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include 
+
+#include "arrow/array.h"
+#include "arrow/compute/api.h"
+#include "arrow/compute/kernels/test_util.h"
+#include "arrow/testing/gtest_common.h"
+#include "arrow/testing/gtest_util.h"
+#include "arrow/testing/random.h"
+#include "arrow/type.h"
+#include "arrow/type_traits.h"
+#include "arrow/util/bitmap_reader.h"
+#include "arrow/util/checked_cast.h"
+
+namespace arrow {
+namespace compute {
+
+class TestValidityKernels : public ::testing::Test {
+ protected:
+  // XXX Since IsValid and IsNull don't touch any buffers but the null bitmap
+  // testing multiple types seems redundant.
+  using ArrowType = BooleanType;
+
+  using CType = typename ArrowType::c_type;
+
+  static std::shared_ptr type_singleton() {
+return TypeTraits::type_singleton();
+  }
+
+  void AssertUnary(Datum arg, Datum expected) {
+ASSERT_OK_AND_ASSIGN(auto actual, func_(arg, nullptr));
+ASSERT_EQ(actual.kind(), expected.kind());
+if (actual.kind() == Datum::ARRAY) {
+  ASSERT_OK(actual.make_array()->ValidateFull());
+  AssertArraysApproxEqual(*expected.make_array(), *actual.make_array());
+} else {
+  AssertScalarsEqual(*expected.scalar(), *actual.scalar());
+}
+  }
+
+  void AssertUnary(const std::string& arg_json, const std::string& 
expected_json) {
+AssertUnary(ArrayFromJSON(type_singleton(), arg_json),
+ArrayFromJSON(type_singleton(), expected_json));
+  }
+
+  using UnaryFunction = std::function(const Datum&, 
ExecContext*)>;
+  UnaryFunction func_;
+};
+
+TEST_F(TestValidityKernels, ArrayIsValid) {
+  func_ = arrow::compute::IsValid;
+
+  this->AssertUnary("[]", "[]");
+  this->AssertUnary("[null]", "[false]");
+  this->AssertUnary("[1]", "[true]");
+  this->AssertUnary("[null, 1, 0, null]", "[false, true, true, false]");
+}
+
+TEST_F(TestValidityKernels, ArrayIsValidBufferPassthruOptimization) {
+  Datum arg = ArrayFromJSON(boolean(), "[null, 1, 0, null]");
+  ASSERT_OK_AND_ASSIGN(auto validity, arrow::compute::IsValid(arg));
+  ASSERT_EQ(validity.array()->buffers[1], arg.array()->buffers[0]);
+}
+
+TEST_F(TestValidityKernels, ScalarIsValid) {
+  func_ = arrow::compute::IsValid;
+
+  AssertUnary(Datum(19.7), Datum(true));
+  AssertUnary(MakeNullScalar(float64()), Datum(false));
+}
+
+TEST_F(TestValidityKernels, ArrayIsNull) {
+  func_ = arrow::compute::IsNull;
+
+  this->AssertUnary("[]", "[]");
+  this->AssertUnary("[null]", "[true]");
+  this->AssertUnary("[1]", "[false]");
+  this->AssertUnary("[null, 1, 0, null]", "[true, false, false, true]");
+}
+
+TEST_F(TestValidityKernels, DISABLED_ScalarIsNull) {
+  func_ = arrow::compute::IsNull;
+
+  AssertUnary(Datum(19.7), Datum(false));
+  AssertUnary(MakeNullScalar(float64()), Datum(true));
+}
+
+class IsValidProperty : public ScalarFunctionPropertyMixin {
+ public:
+  std::shared_ptr GetFunction() override {
+return internal::checked_pointer_cast(
+*GetFunctionRegistry()->GetFunction("is_valid"));
+  }
+
+  Result> Contract(const ScalarVector& args,
+   const FunctionOptions*) override {
+return std::make_shared(args[0]->is_valid);
+  }
+};
+
+TEST_P(IsValidProperty, TestIsValidProperty) { Validate(); }
+
+INSTANTIATE_TEST_SUITE_P(IsValidPropertyTest, IsValidProperty,
+ ScalarFunctionPropertyTestParam::Values({
+ {0, 0.0},
+ {1, 0.0},
+ {2, 0.0},
+ {1024, 0.25},
+ }));
+
+class IsNullProperty : public ScalarFunctionPropertyMixin {
+ public:
+  std::shared_ptr GetFunction() override {
+return internal::checked_pointer_cast(
+*GetFunctionRegistry()->GetFunction("is_null"));
+  }
+
+  Result> Contract(const ScalarVector& args,
+   const FunctionOptions*) override {
+return std::make_shared(!args[0]->is_valid);
+  }
+};

[GitHub] [arrow] nealrichardson commented on pull request #7430: ARROW-9126: [C++] Fix building trimmed Boost bundle on Windows

2020-06-15 Thread GitBox


nealrichardson commented on pull request #7430:
URL: https://github.com/apache/arrow/pull/7430#issuecomment-644207536


   Thanks @ctring! 
   
   Since the builds have been green, we must not have any CI jobs that test 
building with bundled boost and thrift on MSVC. Should we add one (nightly 
perhaps)? @kou thoughts?
   
   To be clear, if we add such a job, it should fail until the trimmed boost is 
rebuild, assuming that that typeof/incr_registration_group.hpp is required as 
reported here. How exactly did you confirm that these changes work as expected, 
@ctring?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on a change in pull request #7423: ARROW-8942: [R] Detect compression in reading CSV/JSON

2020-06-15 Thread GitBox


nealrichardson commented on a change in pull request #7423:
URL: https://github.com/apache/arrow/pull/7423#discussion_r440263928



##
File path: r/tests/testthat/test-csv.R
##
@@ -168,3 +168,15 @@ test_that("read_csv_arrow() respects col_select", {
   tib <- read_csv_arrow(tf, col_select = starts_with("Sepal"), as_data_frame = 
TRUE)
   expect_equal(tib, tibble::tibble(Sepal.Length = iris$Sepal.Length, 
Sepal.Width = iris$Sepal.Width))
 })
+
+test_that("read_csv_arrow() respects col_select", {

Review comment:
   Thanks 🤦 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nevi-me closed pull request #7432: ARROW-9127: [Rust] Update thrift dependency to 0.13 (latest)

2020-06-15 Thread GitBox


nevi-me closed pull request #7432:
URL: https://github.com/apache/arrow/pull/7432


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nevi-me commented on pull request #7324: ARROW-9005: [Rust] [Datafusion] support sort expression

2020-06-15 Thread GitBox


nevi-me commented on pull request #7324:
URL: https://github.com/apache/arrow/pull/7324#issuecomment-644232183


   We can address any queries in follow up Jiras



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nevi-me closed pull request #7324: ARROW-9005: [Rust] [Datafusion] support sort expression

2020-06-15 Thread GitBox


nevi-me closed pull request #7324:
URL: https://github.com/apache/arrow/pull/7324


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nevi-me commented on pull request #7309: ARROW-8993: [Rust] support reading non-seekable sources

2020-06-15 Thread GitBox


nevi-me commented on pull request #7309:
URL: https://github.com/apache/arrow/pull/7309#issuecomment-644233988


   Hi @zeapo, can you please rebase to fix the merge conflict? I'll merge this 
PR after that



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




github@arrow.apache.org

2020-06-15 Thread GitBox


nevi-me closed pull request #7428:
URL: https://github.com/apache/arrow/pull/7428


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] fsaintjacques commented on pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API by JNI to C++

2020-06-15 Thread GitBox


fsaintjacques commented on pull request #7030:
URL: https://github.com/apache/arrow/pull/7030#issuecomment-644238932


   Tests are still failing, looks like orc is failing, did you rebase with 
master?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7284: ARROW-7409: [C++][Python] Windows link error LNK1104: cannot open file 'python37_d.lib'

2020-06-15 Thread GitBox


wesm commented on pull request #7284:
URL: https://github.com/apache/arrow/pull/7284#issuecomment-644241777


   @raulbocanegra ping



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on a change in pull request #7420: ARROW-9022: [C++] Add/Sub/Mul arithmetic kernels with overflow check

2020-06-15 Thread GitBox


kszucs commented on a change in pull request #7420:
URL: https://github.com/apache/arrow/pull/7420#discussion_r440322018



##
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##
@@ -60,6 +68,42 @@ struct Add {
   }
 };
 
+struct AddChecked {
+#if __has_builtin(__builtin_add_overflow)
+  template 
+  static enable_if_integer Call(KernelContext* ctx, T left, T right) {
+T result;
+if (__builtin_add_overflow(left, right, &result)) {
+  ctx->SetStatus(Status::Invalid("overflow"));

Review comment:
   Which error should we raise? ExecutionError?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on pull request #7420: ARROW-9022: [C++] Add/Sub/Mul arithmetic kernels with overflow check

2020-06-15 Thread GitBox


kszucs commented on pull request #7420:
URL: https://github.com/apache/arrow/pull/7420#issuecomment-644258425


   Shall we have benchmarks for the new operators?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] zeapo commented on pull request #7309: ARROW-8993: [Rust] support reading non-seekable sources

2020-06-15 Thread GitBox


zeapo commented on pull request #7309:
URL: https://github.com/apache/arrow/pull/7309#issuecomment-644261323


   I'll fix the build tonight.
   
   Le lun. 15 juin 2020 à 18:21, Wakahisa  a écrit :
   
   > Hi @zeapo , can you please rebase to fix the
   > merge conflict? I'll merge this PR after that
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > , or
   > unsubscribe
   > 

   > .
   >
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou opened a new pull request #7436: ARROW-9094: [Python] Bump versions of compiled dependencies in manylinux wheels

2020-06-15 Thread GitBox


pitrou opened a new pull request #7436:
URL: https://github.com/apache/arrow/pull/7436


   Also remove unnecessary builds such as LLVM.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on pull request #7436: ARROW-9094: [Python] Bump versions of compiled dependencies in manylinux wheels

2020-06-15 Thread GitBox


pitrou commented on pull request #7436:
URL: https://github.com/apache/arrow/pull/7436#issuecomment-644285628


   @github-actions crossbow submit -g wheel



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #7436: ARROW-9094: [Python] Bump versions of compiled dependencies in manylinux wheels

2020-06-15 Thread GitBox


pitrou commented on a change in pull request #7436:
URL: https://github.com/apache/arrow/pull/7436#discussion_r440353321



##
File path: cpp/cmake_modules/FindgRPCAlt.cmake
##
@@ -169,13 +169,14 @@ if(gRPCAlt_FOUND)
INTERFACE_INCLUDE_DIRECTORIES 
"${GRPC_INCLUDE_DIR}")
 
   add_library(gRPC::grpc UNKNOWN IMPORTED)
-  set_target_properties(gRPC::grpc

Review comment:
   @kou I'd like your feedback on those changes.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7436: ARROW-9094: [Python] Bump versions of compiled dependencies in manylinux wheels

2020-06-15 Thread GitBox


github-actions[bot] commented on pull request #7436:
URL: https://github.com/apache/arrow/pull/7436#issuecomment-644286546


   Revision: a48f8064906bb8f4f6344680efd4884c3963b9a9
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-320](https://github.com/ursa-labs/crossbow/branches/all?query=actions-320)
   
   |Task|Status|
   ||--|
   
|wheel-manylinux1-cp35m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-320-azure-wheel-manylinux1-cp35m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-320-azure-wheel-manylinux1-cp35m)|
   
|wheel-manylinux1-cp36m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-320-azure-wheel-manylinux1-cp36m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-320-azure-wheel-manylinux1-cp36m)|
   
|wheel-manylinux1-cp37m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-320-azure-wheel-manylinux1-cp37m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-320-azure-wheel-manylinux1-cp37m)|
   
|wheel-manylinux1-cp38|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-320-azure-wheel-manylinux1-cp38)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-320-azure-wheel-manylinux1-cp38)|
   
|wheel-manylinux2010-cp35m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-320-azure-wheel-manylinux2010-cp35m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-320-azure-wheel-manylinux2010-cp35m)|
   
|wheel-manylinux2010-cp36m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-320-azure-wheel-manylinux2010-cp36m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-320-azure-wheel-manylinux2010-cp36m)|
   
|wheel-manylinux2010-cp37m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-320-azure-wheel-manylinux2010-cp37m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-320-azure-wheel-manylinux2010-cp37m)|
   
|wheel-manylinux2010-cp38|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-320-azure-wheel-manylinux2010-cp38)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-320-azure-wheel-manylinux2010-cp38)|
   
|wheel-manylinux2014-cp35m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-320-azure-wheel-manylinux2014-cp35m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-320-azure-wheel-manylinux2014-cp35m)|
   
|wheel-manylinux2014-cp36m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-320-azure-wheel-manylinux2014-cp36m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-320-azure-wheel-manylinux2014-cp36m)|
   
|wheel-manylinux2014-cp37m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-320-azure-wheel-manylinux2014-cp37m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-320-azure-wheel-manylinux2014-cp37m)|
   
|wheel-manylinux2014-cp38|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-320-azure-wheel-manylinux2014-cp38)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-320-azure-wheel-manylinux2014-cp38)|
   
|wheel-osx-cp35m|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-320-travis-wheel-osx-cp35m.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   
|wheel-osx-cp36m|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-320-travis-wheel-osx-cp36m.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   
|wheel-osx-cp37m|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-320-travis-wheel-osx-cp37m.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   
|wheel-osx-cp38|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-320-travis-wheel-osx-cp38.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   
|wheel-win-cp35m|[![Appveyor](https://img.shields.io/appveyor/ci/ursa-labs/crossbow/actions-320-appveyor-wheel-win-cp35m.svg)](https://ci.appveyor.com/project/ursa-labs/crossbow/history)|
   
|wheel-win-cp36m|[![Appveyor](https://img.shields.io/appveyor/ci/ursa-labs/crossbow/actions-320-appveyor-wheel-win-cp36m.svg)](https://ci.a

[GitHub] [arrow] pitrou commented on pull request #7436: ARROW-9094: [Python] Bump versions of compiled dependencies in manylinux wheels

2020-06-15 Thread GitBox


pitrou commented on pull request #7436:
URL: https://github.com/apache/arrow/pull/7436#issuecomment-644291011


   @github-actions crossbow submit -g wheel



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7436: ARROW-9094: [Python] Bump versions of compiled dependencies in manylinux wheels

2020-06-15 Thread GitBox


github-actions[bot] commented on pull request #7436:
URL: https://github.com/apache/arrow/pull/7436#issuecomment-644290997


   https://issues.apache.org/jira/browse/ARROW-9094



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7436: ARROW-9094: [Python] Bump versions of compiled dependencies in manylinux wheels

2020-06-15 Thread GitBox


github-actions[bot] commented on pull request #7436:
URL: https://github.com/apache/arrow/pull/7436#issuecomment-644291915


   Revision: 93a3ffe43d7d6c0f3377cfe5ec63b0969923e6d7
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-321](https://github.com/ursa-labs/crossbow/branches/all?query=actions-321)
   
   |Task|Status|
   ||--|
   
|wheel-manylinux1-cp35m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-321-azure-wheel-manylinux1-cp35m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-321-azure-wheel-manylinux1-cp35m)|
   
|wheel-manylinux1-cp36m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-321-azure-wheel-manylinux1-cp36m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-321-azure-wheel-manylinux1-cp36m)|
   
|wheel-manylinux1-cp37m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-321-azure-wheel-manylinux1-cp37m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-321-azure-wheel-manylinux1-cp37m)|
   
|wheel-manylinux1-cp38|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-321-azure-wheel-manylinux1-cp38)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-321-azure-wheel-manylinux1-cp38)|
   
|wheel-manylinux2010-cp35m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-321-azure-wheel-manylinux2010-cp35m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-321-azure-wheel-manylinux2010-cp35m)|
   
|wheel-manylinux2010-cp36m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-321-azure-wheel-manylinux2010-cp36m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-321-azure-wheel-manylinux2010-cp36m)|
   
|wheel-manylinux2010-cp37m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-321-azure-wheel-manylinux2010-cp37m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-321-azure-wheel-manylinux2010-cp37m)|
   
|wheel-manylinux2010-cp38|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-321-azure-wheel-manylinux2010-cp38)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-321-azure-wheel-manylinux2010-cp38)|
   
|wheel-manylinux2014-cp35m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-321-azure-wheel-manylinux2014-cp35m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-321-azure-wheel-manylinux2014-cp35m)|
   
|wheel-manylinux2014-cp36m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-321-azure-wheel-manylinux2014-cp36m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-321-azure-wheel-manylinux2014-cp36m)|
   
|wheel-manylinux2014-cp37m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-321-azure-wheel-manylinux2014-cp37m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-321-azure-wheel-manylinux2014-cp37m)|
   
|wheel-manylinux2014-cp38|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-321-azure-wheel-manylinux2014-cp38)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-321-azure-wheel-manylinux2014-cp38)|
   
|wheel-osx-cp35m|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-321-travis-wheel-osx-cp35m.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   
|wheel-osx-cp36m|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-321-travis-wheel-osx-cp36m.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   
|wheel-osx-cp37m|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-321-travis-wheel-osx-cp37m.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   
|wheel-osx-cp38|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-321-travis-wheel-osx-cp38.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   
|wheel-win-cp35m|[![Appveyor](https://img.shields.io/appveyor/ci/ursa-labs/crossbow/actions-321-appveyor-wheel-win-cp35m.svg)](https://ci.appveyor.com/project/ursa-labs/crossbow/history)|
   
|wheel-win-cp36m|[![Appveyor](https://img.shields.io/appveyor/ci/ursa-labs/crossbow/actions-321-appveyor-wheel-win-cp36m.svg)](https://ci.a

[GitHub] [arrow] fsaintjacques opened a new pull request #7437: ARROW-8943: [C++][Python][Dataset] Add partitioning support to ParquetDatasetFactory

2020-06-15 Thread GitBox


fsaintjacques opened a new pull request #7437:
URL: https://github.com/apache/arrow/pull/7437


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7437: ARROW-8943: [C++][Python][Dataset] Add partitioning support to ParquetDatasetFactory

2020-06-15 Thread GitBox


github-actions[bot] commented on pull request #7437:
URL: https://github.com/apache/arrow/pull/7437#issuecomment-644304497


   https://issues.apache.org/jira/browse/ARROW-8943



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] fsaintjacques commented on a change in pull request #7435: ARROW-8779: [R] Implement conversion to List

2020-06-15 Thread GitBox


fsaintjacques commented on a change in pull request #7435:
URL: https://github.com/apache/arrow/pull/7435#discussion_r440369138



##
File path: r/src/array_from_vector.cpp
##
@@ -201,6 +201,39 @@ struct VectorToArrayConverter {
 return Status::OK();
   }
 
+  template 
+  arrow::enable_if_t::value, Status> Visit(const T& 
type) {
+using BuilderType = typename TypeTraits::BuilderType;
+ARROW_RETURN_IF(!Rf_inherits(x, "data.frame"),
+Status::RError("Expecting a data frame"));
+
+auto* struct_builder = checked_cast(builder);
+
+int64_t n = 0;
+if (XLENGTH(x) > 0) {
+  // TODO: need a more generic way to get the vec_size()
+  //   perhaps using vctrsshort_vec_size
+  n = XLENGTH(VECTOR_ELT(x, 0));
+}
+RETURN_NOT_OK(struct_builder->Reserve(n));
+
+// TODO: double check but seems fine that no value is NULL

Review comment:
   You can remove the TODO, this is correct.

##
File path: r/tests/testthat/test-Array.R
##
@@ -445,6 +445,9 @@ test_that("Array$create() handles vector -> list arrays 
(ARROW-7662)", {
   expect_array_roundtrip(list(character(0)), list_of(utf8()))
   expect_array_roundtrip(list("itsy", c("bitsy", "spider"), c("is")), 
list_of(utf8()))
   expect_array_roundtrip(list("itsy", character(0), c("bitsy", "spider", 
NA_character_), c("is")), list_of(utf8()))
+
+  # struct
+  expect_array_roundtrip(list(tibble::tibble(a = integer(0))), 
list_of(struct(a = int32(

Review comment:
   Add a bit more columns: float, string, bool. Also add a failing case 
where columns are not of the same size.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7415: ARROW-7028: [R] Date roundtrip results in different R storage mode

2020-06-15 Thread GitBox


nealrichardson commented on pull request #7415:
URL: https://github.com/apache/arrow/pull/7415#issuecomment-644306996


   It turns out we don't ever, AFAICT, move data from Arrow to R with zero copy 
(https://issues.apache.org/jira/browse/ARROW-9140), so I don't have any 
objections to making this change.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] bkietz opened a new pull request #7438: ARROW-9105: [C++][Dataset][Python] Infer partition schema from partition expression

2020-06-15 Thread GitBox


bkietz opened a new pull request #7438:
URL: https://github.com/apache/arrow/pull/7438


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] fsaintjacques commented on pull request #7435: ARROW-8779: [R] Implement conversion to List

2020-06-15 Thread GitBox


fsaintjacques commented on pull request #7435:
URL: https://github.com/apache/arrow/pull/7435#issuecomment-644308992


   I also updated ARROW-7798 with comments to avoid builder in the conversion.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on a change in pull request #7415: ARROW-7028: [R] Date roundtrip results in different R storage mode

2020-06-15 Thread GitBox


nealrichardson commented on a change in pull request #7415:
URL: https://github.com/apache/arrow/pull/7415#discussion_r440371130



##
File path: r/tests/testthat/test-Array.R
##
@@ -202,6 +202,8 @@ test_that("array supports Date (ARROW-3340)", {
   expect_array_roundtrip(d2, date32())
   # PSA: IngestDoubleRange(Date32Builder) truncates decimals, so this only
   # works where the dates are integer-ish
+
+  expect_equal(typeof(Array$create(d)$as_vector()), "double")

Review comment:
   If we want to assert about R storage type, I'd move the into 
`expect_array_roundtrip()` defined at the top of this file, below L30:
   
   ```r
   expect_identical(typeof(as.vector(a)), typeof(x))
   ```
   
   so that we can test all types. (I did this locally just now on master, and 
only Date fails.)
   
   I also wonder if it would be good to add a test here with `dplyr::if_else` 
to reproduce the original issue. IMO the storage mode is only interesting 
because of the behavior side effects.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on pull request #7436: ARROW-9094: [Python] Bump versions of compiled dependencies in manylinux wheels

2020-06-15 Thread GitBox


pitrou commented on pull request #7436:
URL: https://github.com/apache/arrow/pull/7436#issuecomment-644316784


   @kszucs What is the procedure to push the updated docker images?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7438: ARROW-9105: [C++][Dataset][Python] Infer partition schema from partition expression

2020-06-15 Thread GitBox


github-actions[bot] commented on pull request #7438:
URL: https://github.com/apache/arrow/pull/7438#issuecomment-644317249


   https://issues.apache.org/jira/browse/ARROW-9105



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on pull request #7436: ARROW-9094: [Python] Bump versions of compiled dependencies in manylinux wheels

2020-06-15 Thread GitBox


kszucs commented on pull request #7436:
URL: https://github.com/apache/arrow/pull/7436#issuecomment-644320852


   > @kszucs What is the procedure to push the updated docker images?
   
   The nightlies running from master branch automatically push the updated 
images.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs opened a new pull request #7439: ARROW-4309: [Documentation] Add a docker-compose entry which builds the documentation with CUDA enabled

2020-06-15 Thread GitBox


kszucs opened a new pull request #7439:
URL: https://github.com/apache/arrow/pull/7439


   Currently testing it, sadly docker-compose doesn't support nvidia runtime so 
some additional documentation and/or archery development is required.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on pull request #6512: ARROW-8430: [CI] Configure self-hosted runners for Github Actions

2020-06-15 Thread GitBox


kszucs commented on pull request #6512:
URL: https://github.com/apache/arrow/pull/6512#issuecomment-644325417


   Sure, updating to use never ubuntu versions.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on pull request #7439: ARROW-4309: [Documentation] Add a docker-compose entry which builds the documentation with CUDA enabled

2020-06-15 Thread GitBox


kszucs commented on pull request #7439:
URL: https://github.com/apache/arrow/pull/7439#issuecomment-644326074


   I'm going to add some extra hack to compile the `docker-compose run` command 
directly to `docker run`, just needs to merge #6512 to prevent conflicts.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7439: ARROW-4309: [Documentation] Add a docker-compose entry which builds the documentation with CUDA enabled

2020-06-15 Thread GitBox


github-actions[bot] commented on pull request #7439:
URL: https://github.com/apache/arrow/pull/7439#issuecomment-644329406


   https://issues.apache.org/jira/browse/ARROW-4309



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on pull request #7436: ARROW-9094: [Python] Bump versions of compiled dependencies in manylinux wheels

2020-06-15 Thread GitBox


pitrou commented on pull request #7436:
URL: https://github.com/apache/arrow/pull/7436#issuecomment-644335590


   (I cancelled the GHA builds since they shouldn't affected by this PR's 
changes)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] bkietz opened a new pull request #7440: ARROW-8631: [C++][Dataset][Python] Raise in discovery on unparsable partition expression

2020-06-15 Thread GitBox


bkietz opened a new pull request #7440:
URL: https://github.com/apache/arrow/pull/7440


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7440: ARROW-8631: [C++][Dataset][Python] Raise in discovery on unparsable partition expression

2020-06-15 Thread GitBox


github-actions[bot] commented on pull request #7440:
URL: https://github.com/apache/arrow/pull/7440#issuecomment-644356263


   https://issues.apache.org/jira/browse/ARROW-8631



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ctring commented on pull request #7430: ARROW-9126: [C++] Fix building trimmed Boost bundle on Windows

2020-06-15 Thread GitBox


ctring commented on pull request #7430:
URL: https://github.com/apache/arrow/pull/7430#issuecomment-644373157


   @nealrichardson Once I made all the changes in the CMake file, I could pass 
the boost_ep build phase but failed at thrift_ep with the error that 
`typeof/incr_registration_group.hpp` was missing.
   ```
   ...\build\boost_ep-prefix\src\boost_ep\boost/scope_exit.hpp(257,10): fatal 
error C1083: Cannot open include file: 
'boost/typeof/incr_registration_group.hpp': No such file or directory 
(compiling source file 
C:\source\arrow\cpp\build\thrift_ep-prefix\src\thrift_ep\lib\cpp\src\thrift\windows\OverlappedSubmissionThread.cpp)
 
[C:\source\arrow\cpp\build\thrift_ep-prefix\src\thrift_ep-build\lib\cpp\thrift_static.vcxproj]
   
   ```
   From the error, you can tell the reason why this happened on Windows but not 
Linux: Thrift would additionally compile the file 
`windows\OverlappedSubmissionThread.cpp` on Windows, which required 
`boost/scope_exit.hpp`, which included 
`boost/typeof/incr_registration_group.hpp`
   
   I then copied the `boost/typeof/incr_registration_group.hpp` from the 
original Boost source to the boost_ep folder in the build tree. After that, I 
could build everything successfully, which means that header was the only 
missing file.
   
   I also ran the `trim-boost.sh` script to make sure the missing header file 
would be in the new bundle.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson opened a new pull request #7441: ARROW-3446: [R] Document mapping of Arrow <-> R types

2020-06-15 Thread GitBox


nealrichardson opened a new pull request #7441:
URL: https://github.com/apache/arrow/pull/7441


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7441: ARROW-3446: [R] Document mapping of Arrow <-> R types

2020-06-15 Thread GitBox


github-actions[bot] commented on pull request #7441:
URL: https://github.com/apache/arrow/pull/7441#issuecomment-644380381


   https://issues.apache.org/jira/browse/ARROW-3446



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #6729: ARROW-8229: [Java] Move ArrowBuf into the Arrow package

2020-06-15 Thread GitBox


kou commented on pull request #6729:
URL: https://github.com/apache/arrow/pull/6729#issuecomment-644387504


   @liyafan82 OK. Could you open an JIRA issue for Spark to notify this to 
Spark developers? https://issues.apache.org/jira/browse/SPARK
   FYI: https://spark.apache.org/contributing.html
   
   @kiszk Could you support updating Spark for this change?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #7335: ARROW-9018: [C++] Remove APIs that were marked as deprecated in 0.17.0 and prior

2020-06-15 Thread GitBox


kou commented on pull request #7335:
URL: https://github.com/apache/arrow/pull/7335#issuecomment-644392554


   Great!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #7433: ARROW-9129: [Python][JPype] Remove JPype version check

2020-06-15 Thread GitBox


kou commented on pull request #7433:
URL: https://github.com/apache/arrow/pull/7433#issuecomment-644393428


   @xhochy Could you take a look the 
https://github.com/apache/arrow/pull/7433#issuecomment-643824491 error?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #7430: ARROW-9126: [C++] Fix building trimmed Boost bundle on Windows

2020-06-15 Thread GitBox


kou commented on pull request #7430:
URL: https://github.com/apache/arrow/pull/7430#issuecomment-644397052


   > Since the builds have been green, we must not have any CI jobs that test 
building with bundled boost and thrift on MSVC. Should we add one (nightly 
perhaps)? @kou thoughts?
   
   Yes.
   We should add a job for the case to our nightly test jobs.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on a change in pull request #7436: ARROW-9094: [Python] Bump versions of compiled dependencies in manylinux wheels

2020-06-15 Thread GitBox


kou commented on a change in pull request #7436:
URL: https://github.com/apache/arrow/pull/7436#discussion_r440463018



##
File path: python/manylinux1/scripts/build_boost.sh
##
@@ -16,12 +16,12 @@
 # specific language governing permissions and limitations
 # under the License.
 
-BOOST_VERSION=1.68.0
+BOOST_VERSION=1.73.0
 BOOST_VERSION_UNDERSCORE=${BOOST_VERSION//\./_}
 NCORES=$(($(grep -c ^processor /proc/cpuinfo) + 1))
 
-curl -sL 
https://sourceforge.net/projects/boost/files/boost/${BOOST_VERSION}/boost_${BOOST_VERSION_UNDERSCORE}.tar.gz
 -o /boost_${BOOST_VERSION_UNDERSCORE}.tar.gz
-tar xf boost_${BOOST_VERSION_UNDERSCORE}.tar.gz
+curl -sL 
https://dl.bintray.com/boostorg/release/${BOOST_VERSION}/source/boost_${BOOST_VERSION_UNDERSCORE}.tar.bz2
 -o /boost_${BOOST_VERSION_UNDERSCORE}.tar.bz2

Review comment:
   Could you keep sourceforge.net URL?
   Because dl.bintray.com/boostorg/ will reach limitation constantly. See also: 
https://github.com/boostorg/boost/issues/299

##
File path: cpp/cmake_modules/FindgRPCAlt.cmake
##
@@ -169,13 +169,14 @@ if(gRPCAlt_FOUND)
INTERFACE_INCLUDE_DIRECTORIES 
"${GRPC_INCLUDE_DIR}")
 
   add_library(gRPC::grpc UNKNOWN IMPORTED)
-  set_target_properties(gRPC::grpc

Review comment:
   Adding `ZLIB::ZLIB` and `c-ares::cares` to `INTERFACE_LINK_LIBRARIES` is 
reasonable because `libgrpc++.a` depends on them.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7430: ARROW-9126: [C++] Fix building trimmed Boost bundle on Windows

2020-06-15 Thread GitBox


nealrichardson commented on pull request #7430:
URL: https://github.com/apache/arrow/pull/7430#issuecomment-644410813


   Maybe adding `BOOST_SOURCE: BUNDLED` around 
[here](https://github.com/apache/arrow/blob/master/.github/workflows/cpp.yml#L179)
 would be sufficient? The job is already building Thrift.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #7430: ARROW-9126: [C++] Fix building trimmed Boost bundle on Windows

2020-06-15 Thread GitBox


kou commented on pull request #7430:
URL: https://github.com/apache/arrow/pull/7430#issuecomment-644412255


   It will work but it may increase CI time for "C++ / AMD64 Windows 2019 C++".
   Increasing CI time may not be acceptable.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7430: ARROW-9126: [C++] Fix building trimmed Boost bundle on Windows

2020-06-15 Thread GitBox


nealrichardson commented on pull request #7430:
URL: https://github.com/apache/arrow/pull/7430#issuecomment-644413986


   I suspect it wouldn't increase build time much, at least not enough to block 
things. Where would you prefer to run the test? Do we run windows nightlies on 
crossbow?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #7430: ARROW-9126: [C++] Fix building trimmed Boost bundle on Windows

2020-06-15 Thread GitBox


kou commented on pull request #7430:
URL: https://github.com/apache/arrow/pull/7430#issuecomment-644450282


   OK. Let's try it in this pull request. If it doesn't increase build time 
much, we can change the current job.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #7430: ARROW-9126: [C++] Fix building trimmed Boost bundle on Windows

2020-06-15 Thread GitBox


kou commented on pull request #7430:
URL: https://github.com/apache/arrow/pull/7430#issuecomment-644468001


   Failed. Do we need to re-generate our Boost archive?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ctring commented on pull request #7430: ARROW-9126: [C++] Fix building trimmed Boost bundle on Windows

2020-06-15 Thread GitBox


ctring commented on pull request #7430:
URL: https://github.com/apache/arrow/pull/7430#issuecomment-644468486


   I believe you need to regenerate and reupload the trimmed Boost bundle. This 
failed due to the same reason I mentioned above since it is still downloading 
the faulty bundle.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] zhztheplayer commented on a change in pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API by JNI to C++

2020-06-15 Thread GitBox


zhztheplayer commented on a change in pull request #7030:
URL: https://github.com/apache/arrow/pull/7030#discussion_r440075959



##
File path: 
java/dataset/src/main/java/org/apache/arrow/memory/NativeUnderlingMemory.java
##
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.memory;
+
+import org.apache.arrow.dataset.jni.JniWrapper;
+
+/**
+ * AllocationManager implementation for Native allocated memory.
+ */
+public class NativeUnderlingMemory extends AllocationManager {

Review comment:
   Hi @jacques-n,
   
   Thanks for your suggestion but I think the "weaknesses" of current 
implementation might be somehow overrated. Specifically for orc adaptor it's of 
course urgent to improve its memory allocation process because all of the 
allocated memory by orc is escaping from Java allocator. However we don't have 
the same problem here in Datasets as allocator is going to manage all the 
native memory. The only issue we may have is that OOM error might be thrown 
after a buffer is allocated rather than before we create it. But because OOM 
will be finally thrown and over-allocated memory will be finally released, the 
impact will be very very small.
   
   And I knew we can possibly change to allocate Java memory directly from C++ 
via JNI, but codes will become messy and performance will turn to worse. That 
is why I didn't propose such a solution in this PR.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] zhztheplayer commented on pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API by JNI to C++

2020-06-15 Thread GitBox


zhztheplayer commented on pull request #7030:
URL: https://github.com/apache/arrow/pull/7030#issuecomment-644494295


   @fsaintjacques Rebased now. 
   
   As we have a unresolved 
[discussion](https://github.com/apache/arrow/pull/7030#discussion_r440075959) 
about memory management of native buffers. Should we clarify a bit more before 
writing any fixes?
   
   May first question is what's the major problem now? I list some that I can 
see in comments:
   
   1. We are package-hacking `org.apache.arrow.memory` in module 
`arrow-dataset`.
   Yes `Ownerships.java` uses some hacks, so I've removed the class in latest 
commits (the whole class is not necessary for current use case). Now only 
`NativeUnderlyingMemory.java` is still in the package 
`org.apache.arrow.memory`. 
   2. We lose java native heap caps.
   Can we clarify this? I think the explanation can be either we lose caps from 
JVM direct memory, or we lose caps from BufferAllocator. For the latter one I 
think buffers are already registered to allocator.
   3. OOM-check is proceed after buffer is created.
   Yes. But is this really a big problem? see 
[comment](https://github.com/apache/arrow/pull/7030#discussion_r440075959).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] zhztheplayer edited a comment on pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API by JNI to C++

2020-06-15 Thread GitBox


zhztheplayer edited a comment on pull request #7030:
URL: https://github.com/apache/arrow/pull/7030#issuecomment-644494295


   @fsaintjacques Rebased now. 
   
   As we have a unresolved 
[discussion](https://github.com/apache/arrow/pull/7030#discussion_r440075959) 
about memory management of native buffers. Should we clarify a bit more before 
writing any fixes? (also ping @jacques-n , @emkornfield)
   
   The question is what's the major problem now? I list some that I can see in 
comments:
   
   1. We are package-hacking `org.apache.arrow.memory` in module 
`arrow-dataset`.
   Yes `Ownerships.java` uses some hacks, so I've removed the class in latest 
commits (the whole class is not necessary for current use case). Now only 
`NativeUnderlyingMemory.java` is still in the package 
`org.apache.arrow.memory`. 
   2. We lose java native heap caps.
   Can we clarify this? I think the explanation can be either we lose caps from 
JVM direct memory, or we lose caps from BufferAllocator. For the latter one I 
think buffers are already registered to allocator.
   3. OOM-check is proceed after buffer is created.
   Yes. But is this really a big problem? see 
[comment](https://github.com/apache/arrow/pull/7030#discussion_r440075959).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] andygrove commented on pull request #7297: ARROW-6945: [Rust][Integration] Run rust integration tests

2020-06-15 Thread GitBox


andygrove commented on pull request #7297:
URL: https://github.com/apache/arrow/pull/7297#issuecomment-644505701


   @nealrichardson I think that would be a reasonable assumption to make at 
this point. I'd be happy with this merged even if only a small number of tests 
are passing and we can iterate from there, Sorry, I'd help more but am working 
long hours in the day job currently. I should have time before the release to 
re-engage with this.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm opened a new pull request #7442: ARROW-9075: [C++] Optimized Filter implementation: faster performance + compilation, smaller code size

2020-06-15 Thread GitBox


wesm opened a new pull request #7442:
URL: https://github.com/apache/arrow/pull/7442


   NOTE: the diff is artificially larger due to some code rearranging (that was 
necessitated because of how some data selection code is shared between the Take 
and Filter implementations).
   
   Summary:
   
   * Filter is now 1.5-6x faster across the board, most notably on primitive 
types with high selectivity filters. The BitBlockCounters do a lot of the heavy 
lifting in that case but even in the worst case scenario when the block 
counters never encounter a "full" block, this is still consistently faster.
   * Total -O3 code size for **both** Take and Filter is now about 600KB. 
That's down from about 8MB total prior to this patch and ARROW-5760
   
   Some incidental changes:
   * Implemented a fast conversion from boolean filter to take indices (aka 
"selection vector"),  `compute::internal::GetTakeIndices`. I have also altered 
the implementation of filtering a record batch to use this, which should be 
faster (it would be good to have some benchmarks to confirm this). 
   * Various expansions to the BitBlockCounter classes that I needed to support 
this work
   * Fixed a bug ARROW-9142 with RandomArrayGenerator::Boolean. The probability 
parameter was being interpreted as the probability of a false value rather than 
the probability of a true. IIUC with Bernoulli distributions, the probability 
specified is P(X = 1) not P(X = 0). Please someone confirm this. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7442: ARROW-9075: [C++] Optimized Filter implementation: faster performance + compilation, smaller code size

2020-06-15 Thread GitBox


wesm commented on pull request #7442:
URL: https://github.com/apache/arrow/pull/7442#issuecomment-644509797


   Here's benchmark runs on my machine
   
   * BEFORE: https://gist.github.com/wesm/857a3179e7dbc928d3325b1e7f687086
   * AFTER: https://gist.github.com/wesm/ad07cec1613b6327926dfe1d95e7f4f0
   
   IF YOU WANT TO BENCHMARK YOURSELF, PLEASE USE THIS BRANCH 
https://github.com/wesm/arrow/tree/ARROW-9075-comparison. It contains the 
RandomArrayGenerator::Boolean change and some other changes to the benchmarks 
without which the results will be non-comparable



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm edited a comment on pull request #7442: ARROW-9075: [C++] Optimized Filter implementation: faster performance + compilation, smaller code size

2020-06-15 Thread GitBox


wesm edited a comment on pull request #7442:
URL: https://github.com/apache/arrow/pull/7442#issuecomment-644509797


   Here's benchmark runs on my machine
   
   * BEFORE: https://gist.github.com/wesm/857a3179e7dbc928d3325b1e7f687086
   * AFTER: https://gist.github.com/wesm/ad07cec1613b6327926dfe1d95e7f4f0
   
   **If you want to benchmark yourself, please use this branch for the 
"before":** https://github.com/wesm/arrow/tree/ARROW-9075-comparison. It 
contains the RandomArrayGenerator::Boolean change and some other changes to the 
benchmarks without which the results will be non-comparable



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on a change in pull request #7442: ARROW-9075: [C++] Optimized Filter implementation: faster performance + compilation, smaller code size

2020-06-15 Thread GitBox


wesm commented on a change in pull request #7442:
URL: https://github.com/apache/arrow/pull/7442#discussion_r440565678



##
File path: cpp/src/arrow/testing/random.cc
##
@@ -84,7 +84,7 @@ std::shared_ptr RandomArrayGenerator::Boolean(int64_t 
size, double probab
 
   BufferVector buffers{2};
   // Need 2 distinct generators such that probabilities are not shared.
-  GenOpt value_gen(seed(), 0, 1, probability);
+  GenOpt value_gen(seed(), 0, 1, 1 - probability);

Review comment:
   @pitrou or @fsaintjacques could you please confirm this change





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7442: ARROW-9075: [C++] Optimized Filter implementation: faster performance + compilation, smaller code size

2020-06-15 Thread GitBox


github-actions[bot] commented on pull request #7442:
URL: https://github.com/apache/arrow/pull/7442#issuecomment-644510405


   https://issues.apache.org/jira/browse/ARROW-9075



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7442: ARROW-9075: [C++] Optimized Filter implementation: faster performance + compilation, smaller code size

2020-06-15 Thread GitBox


wesm commented on pull request #7442:
URL: https://github.com/apache/arrow/pull/7442#issuecomment-644513681


   To show some simple numbers to show the perf before and after in Python, 
this example has a high selectivity (all but one value selected) and low 
selectivity filter (only 1% of values selected): 
   
   ```
   import numpy as np
   import pandas as pd
   import pyarrow as pa
   import pyarrow.compute as pc
   
   string_values = pa.array([pd.util.testing.rands(16)
 for i in range(1)] * 100)
   double_values = pa.array(np.random.randn(100))
   
   all_but_one = np.ones(len(string_values), dtype=bool)
   all_but_one[50] = False
   
   only_1pct = np.array(np.random.binomial(1, 0.01, size=100), dtype=bool)
   ```
   
   before:
   
   ```
   In [4]: timeit pc.filter(double_values, only_1pct)   
 
   2.01 ms ± 7.89 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
   
   In [5]: timeit pc.filter(double_values, all_but_one) 
 
   5.74 ms ± 17.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
   
   In [6]: timeit pc.filter(string_values, only_1pct)   
 
   2.21 ms ± 6.87 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
   
   In [7]: timeit pc.filter(string_values, all_but_one) 
 
   11.4 ms ± 142 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
   ```
   
   after
   
   ```
   In [29]: timeit pc.filter(double_values, only_1pct)  
 
   1.43 ms ± 3.79 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
   
   In [30]: timeit pc.filter(double_values, all_but_one)
 
   1.81 ms ± 20.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
   
   In [31]: timeit pc.filter(string_values, only_1pct)  
 
   1.57 ms ± 4.06 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
   
   In [32]: timeit pc.filter(string_values, all_but_one)
 
   6.66 ms ± 39.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm edited a comment on pull request #7442: ARROW-9075: [C++] Optimized Filter implementation: faster performance + compilation, smaller code size

2020-06-15 Thread GitBox


wesm edited a comment on pull request #7442:
URL: https://github.com/apache/arrow/pull/7442#issuecomment-644513681


   To show some simple numbers to show the perf before and after in Python, 
this example has a high selectivity (all but one value selected) and low 
selectivity filter (1/100 and 1/1000):
   
   ```
   import numpy as np
   import pandas as pd
   import pyarrow as pa
   import pyarrow.compute as pc
   
   string_values = pa.array([pd.util.testing.rands(16)
 for i in range(1)] * 100)
   double_values = pa.array(np.random.randn(100))
   
   all_but_one = np.ones(len(string_values), dtype=bool)
   all_but_one[50] = False
   
   one_in_100 = np.array(np.random.binomial(1, 0.01, size=100), dtype=bool)
   one_in_1000 = np.array(np.random.binomial(1, 0.001, size=100), 
dtype=bool)
   ```
   
   before:
   
   ```
   In [2]: timeit pc.filter(double_values, one_in_100)  
  
   2.06 ms ± 41.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
   
   In [3]: timeit pc.filter(double_values, one_in_1000) 
  
   1.82 ms ± 3.69 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
   
   In [4]: timeit pc.filter(double_values, all_but_one) 
  
   5.75 ms ± 15.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
   
   In [5]: timeit pc.filter(string_values, one_in_100)  
  
   2.23 ms ± 14.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
   
   In [6]: timeit pc.filter(string_values, one_in_1000) 
  
   1.85 ms ± 3.92 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
   
   In [7]: timeit pc.filter(string_values, all_but_one) 
  
   11.6 ms ± 183 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
   ```
   
   after
   
   ```
   In [4]: timeit pc.filter(double_values, one_in_100)  
 
   1.1 ms ± 7.03 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
   
   In [5]: timeit pc.filter(double_values, one_in_1000)
   531 µs ± 8.52 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
   
   In [7]: timeit pc.filter(double_values, all_but_one) 
 
   1.83 ms ± 7.36 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
   
   In [10]: timeit pc.filter(string_values, one_in_100) 
  
   1.28 ms ± 3.16 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
   
   In [11]: timeit pc.filter(string_values, one_in_1000)
  
   561 µs ± 1.69 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
   
   In [12]: timeit pc.filter(string_values, all_but_one)
  
   6.66 ms ± 34.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
   ```
   
   EDIT: updated benchmarks for low-selectivity optimization



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   >