[GitHub] [arrow] liyafan82 commented on a change in pull request #8949: ARROW-10880: [Java] Support compressing RecordBatch IPC buffers by LZ4

2021-03-07 Thread GitBox


liyafan82 commented on a change in pull request #8949:
URL: https://github.com/apache/arrow/pull/8949#discussion_r589202945



##
File path: 
java/compression/src/main/java/org/apache/arrow/compression/Lz4CompressionCodec.java
##
@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.compression;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+
+import org.apache.arrow.flatbuf.CompressionType;
+import org.apache.arrow.memory.ArrowBuf;
+import org.apache.arrow.memory.BufferAllocator;
+import org.apache.arrow.memory.util.MemoryUtil;
+import org.apache.arrow.util.Preconditions;
+import org.apache.arrow.vector.compression.CompressionCodec;
+import org.apache.arrow.vector.compression.CompressionUtil;
+import 
org.apache.commons.compress.compressors.lz4.FramedLZ4CompressorInputStream;
+import 
org.apache.commons.compress.compressors.lz4.FramedLZ4CompressorOutputStream;
+import org.apache.commons.compress.utils.IOUtils;
+
+import io.netty.util.internal.PlatformDependent;
+
+/**
+ * Compression codec for the LZ4 algorithm.
+ */
+public class Lz4CompressionCodec implements CompressionCodec {
+
+  @Override
+  public ArrowBuf compress(BufferAllocator allocator, ArrowBuf 
uncompressedBuffer) {
+Preconditions.checkArgument(uncompressedBuffer.writerIndex() <= 
Integer.MAX_VALUE,
+"The uncompressed buffer size exceeds the integer limit");
+
+if (uncompressedBuffer.writerIndex() == 0L) {
+  // shortcut for empty buffer
+  ArrowBuf compressedBuffer = 
allocator.buffer(CompressionUtil.SIZE_OF_UNCOMPRESSED_LENGTH);
+  compressedBuffer.setLong(0, 0);
+  
compressedBuffer.writerIndex(CompressionUtil.SIZE_OF_UNCOMPRESSED_LENGTH);
+  uncompressedBuffer.close();
+  return compressedBuffer;
+}
+
+try {
+  ArrowBuf compressedBuffer = doCompress(allocator, uncompressedBuffer);
+  long compressedLength = compressedBuffer.writerIndex() - 
CompressionUtil.SIZE_OF_UNCOMPRESSED_LENGTH;
+  if (compressedLength > uncompressedBuffer.writerIndex()) {
+// compressed buffer is larger, send the raw buffer
+compressedBuffer.close();
+compressedBuffer = CompressionUtil.compressRawBuffer(allocator, 
uncompressedBuffer);
+  }
+
+  uncompressedBuffer.close();
+  return compressedBuffer;
+} catch (IOException e) {
+  throw new RuntimeException(e);
+}
+  }
+
+  private ArrowBuf doCompress(BufferAllocator allocator, ArrowBuf 
uncompressedBuffer) throws IOException {
+byte[] inBytes = new byte[(int) uncompressedBuffer.writerIndex()];

Review comment:
   Sure. I have opened ARROW-11901 to track it.
   
   I also want to share some thoughts briefly: to use the native memory 
directly, we need a way to wrap `ByteBuffers` as input/output streams (In Java, 
the only "standard" way to access the off-heap memory is through the 
`DirectByteBuffer`). 
   
   We need some third party library to achieve that. We also need to evaluate 
the performance thereafter, because the Commons-compress library also uses 
on-heap data extensively, the copy between on-heap and off-heap data can be 
difficult to avoid. 
   
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] sundy-li closed pull request #9602: ARROW-11630: [Rust] Introduce limit option for sort kernel

2021-03-07 Thread GitBox


sundy-li closed pull request #9602:
URL: https://github.com/apache/arrow/pull/9602


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] liyafan82 commented on a change in pull request #8949: ARROW-10880: [Java] Support compressing RecordBatch IPC buffers by LZ4

2021-03-07 Thread GitBox


liyafan82 commented on a change in pull request #8949:
URL: https://github.com/apache/arrow/pull/8949#discussion_r589171170



##
File path: 
java/compression/src/main/java/org/apache/arrow/compression/Lz4CompressionCodec.java
##
@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.compression;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+
+import org.apache.arrow.flatbuf.CompressionType;
+import org.apache.arrow.memory.ArrowBuf;
+import org.apache.arrow.memory.BufferAllocator;
+import org.apache.arrow.memory.util.MemoryUtil;
+import org.apache.arrow.util.Preconditions;
+import org.apache.arrow.vector.compression.CompressionCodec;
+import org.apache.arrow.vector.compression.CompressionUtil;
+import 
org.apache.commons.compress.compressors.lz4.FramedLZ4CompressorInputStream;
+import 
org.apache.commons.compress.compressors.lz4.FramedLZ4CompressorOutputStream;
+import org.apache.commons.compress.utils.IOUtils;
+
+import io.netty.util.internal.PlatformDependent;
+
+/**
+ * Compression codec for the LZ4 algorithm.
+ */
+public class Lz4CompressionCodec implements CompressionCodec {
+
+  @Override
+  public ArrowBuf compress(BufferAllocator allocator, ArrowBuf 
uncompressedBuffer) {
+Preconditions.checkArgument(uncompressedBuffer.writerIndex() <= 
Integer.MAX_VALUE,
+"The uncompressed buffer size exceeds the integer limit");
+
+if (uncompressedBuffer.writerIndex() == 0L) {
+  // shortcut for empty buffer
+  ArrowBuf compressedBuffer = 
allocator.buffer(CompressionUtil.SIZE_OF_UNCOMPRESSED_LENGTH);
+  compressedBuffer.setLong(0, 0);
+  
compressedBuffer.writerIndex(CompressionUtil.SIZE_OF_UNCOMPRESSED_LENGTH);
+  uncompressedBuffer.close();
+  return compressedBuffer;
+}
+
+try {
+  ArrowBuf compressedBuffer = doCompress(allocator, uncompressedBuffer);
+  long compressedLength = compressedBuffer.writerIndex() - 
CompressionUtil.SIZE_OF_UNCOMPRESSED_LENGTH;
+  if (compressedLength > uncompressedBuffer.writerIndex()) {
+// compressed buffer is larger, send the raw buffer
+compressedBuffer.close();
+compressedBuffer = CompressionUtil.compressRawBuffer(allocator, 
uncompressedBuffer);
+  }
+
+  uncompressedBuffer.close();
+  return compressedBuffer;
+} catch (IOException e) {
+  throw new RuntimeException(e);
+}
+  }
+
+  private ArrowBuf doCompress(BufferAllocator allocator, ArrowBuf 
uncompressedBuffer) throws IOException {
+byte[] inBytes = new byte[(int) uncompressedBuffer.writerIndex()];
+PlatformDependent.copyMemory(uncompressedBuffer.memoryAddress(), inBytes, 
0, uncompressedBuffer.writerIndex());
+ByteArrayOutputStream baos = new ByteArrayOutputStream();
+try (InputStream in = new ByteArrayInputStream(inBytes);
+ OutputStream out = new FramedLZ4CompressorOutputStream(baos)) {
+  IOUtils.copy(in, out);
+}
+
+byte[] outBytes = baos.toByteArray();
+
+ArrowBuf compressedBuffer = 
allocator.buffer(CompressionUtil.SIZE_OF_UNCOMPRESSED_LENGTH + outBytes.length);
+
+long uncompressedLength = uncompressedBuffer.writerIndex();
+if (!MemoryUtil.LITTLE_ENDIAN) {
+  uncompressedLength = Long.reverseBytes(uncompressedLength);
+}
+// first 8 bytes reserved for uncompressed length, to be consistent with 
the
+// C++ implementation.
+compressedBuffer.setLong(0, uncompressedLength);
+
+PlatformDependent.copyMemory(
+outBytes, 0, compressedBuffer.memoryAddress() + 
CompressionUtil.SIZE_OF_UNCOMPRESSED_LENGTH, outBytes.length);
+compressedBuffer.writerIndex(CompressionUtil.SIZE_OF_UNCOMPRESSED_LENGTH + 
outBytes.length);
+return compressedBuffer;
+  }
+
+  @Override
+  public ArrowBuf decompress(BufferAllocator allocator, ArrowBuf 
compressedBuffer) {
+Preconditions.checkArgument(compressedBuffer.writerIndex() <= 
Integer.MAX_VALUE,
+"The compressed buffer size exceeds the integer limit");
+
+Preconditions.checkArgument(compressedBuffer.writerIndex() >= 

[GitHub] [arrow] liyafan82 commented on a change in pull request #8949: ARROW-10880: [Java] Support compressing RecordBatch IPC buffers by LZ4

2021-03-07 Thread GitBox


liyafan82 commented on a change in pull request #8949:
URL: https://github.com/apache/arrow/pull/8949#discussion_r589169372



##
File path: 
java/compression/src/main/java/org/apache/arrow/compression/Lz4CompressionCodec.java
##
@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.compression;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+
+import org.apache.arrow.flatbuf.CompressionType;
+import org.apache.arrow.memory.ArrowBuf;
+import org.apache.arrow.memory.BufferAllocator;
+import org.apache.arrow.memory.util.MemoryUtil;
+import org.apache.arrow.util.Preconditions;
+import org.apache.arrow.vector.compression.CompressionCodec;
+import org.apache.arrow.vector.compression.CompressionUtil;
+import 
org.apache.commons.compress.compressors.lz4.FramedLZ4CompressorInputStream;
+import 
org.apache.commons.compress.compressors.lz4.FramedLZ4CompressorOutputStream;
+import org.apache.commons.compress.utils.IOUtils;
+
+import io.netty.util.internal.PlatformDependent;
+
+/**
+ * Compression codec for the LZ4 algorithm.
+ */
+public class Lz4CompressionCodec implements CompressionCodec {

Review comment:
   Good suggestion. I have opened ARROW-11899 to track it. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #8648: ARROW-7906: [C++] [Python] Add ORC write support

2021-03-07 Thread GitBox


kou commented on pull request #8648:
URL: https://github.com/apache/arrow/pull/8648#issuecomment-792388622


   Sorry for that... :-)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mathyingzhou commented on pull request #8648: ARROW-7906: [C++] [Python] Add ORC write support

2021-03-07 Thread GitBox


mathyingzhou commented on pull request #8648:
URL: https://github.com/apache/arrow/pull/8648#issuecomment-792387596


   @kou  Really thanks! I didn't realize that arrow::Type::type::MAP was not 
fully supported!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wqc200 commented on pull request #9600: ARROW-11822: [Rust][Datafusion] Support case sensitive for function

2021-03-07 Thread GitBox


wqc200 commented on pull request #9600:
URL: https://github.com/apache/arrow/pull/9600#issuecomment-792376112


   @alamb 
   OK, I think we should add support for different dialects



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] seddonm1 commented on pull request #9646: ARROW-11895: [Rust][DataFusion] Add support for more column statistics

2021-03-07 Thread GitBox


seddonm1 commented on pull request #9646:
URL: https://github.com/apache/arrow/pull/9646#issuecomment-792375427


   > FYI @andygrove I plan to use this later for some `ANALYZE / COMPUTE 
STATISTICS` support (would be useful for in memory data) and/or use parquet 
statistics to make the join order optimization more advanced.
   
   This is really cool @Dandandan . And we don't have to make apply the silly 
limitation that Spark does where the table has to be registered in a Hive 
Metastore to allow you to calculate statistics!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #9654: ARROW-11656: [Rust][DataFusion] Remaining Postgres String functions

2021-03-07 Thread GitBox


github-actions[bot] commented on pull request #9654:
URL: https://github.com/apache/arrow/pull/9654#issuecomment-792373859


   https://issues.apache.org/jira/browse/ARROW-11656



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] seddonm1 opened a new pull request #9654: ARROW-11656: [Rust][DataFusion] Remaining Postgres String functions

2021-03-07 Thread GitBox


seddonm1 opened a new pull request #9654:
URL: https://github.com/apache/arrow/pull/9654


   @alamb the last one (for now).
   
   This PR does a few things:
   
   - adds `regexp_replace`, `replace`, `split_part`, `starts_with`, `strpos` 
and `translate`.
   - adds feature flag `unicode_expressions` and moves anything that depends on 
`unicode-segmentation` crate into it.
   - adds feature flag `regex_expressions` and adds `regex` and `lazy_static` 
crates to it.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on pull request #9285: ARROW-10349: [Python] Build and publish aarch64 wheels [WIP]

2021-03-07 Thread GitBox


kszucs commented on pull request #9285:
URL: https://github.com/apache/arrow/pull/9285#issuecomment-792364856


   @eladroz thanks for working on this. We should try to update our pinned 
vcpkg version and report any compilation issues upstream.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on pull request #8491: ARROW-10349: [Python] linux aarch64 wheels

2021-03-07 Thread GitBox


kszucs commented on pull request #8491:
URL: https://github.com/apache/arrow/pull/8491#issuecomment-792364067


   We should favor working on https://github.com/apache/arrow/pull/9285 using 
vcpkg.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] codecov-io edited a comment on pull request #9494: ARROW-11626: [Rust][DataFusion] Move [DataFusion] examples to own project

2021-03-07 Thread GitBox


codecov-io edited a comment on pull request #9494:
URL: https://github.com/apache/arrow/pull/9494#issuecomment-778826002


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9494?src=pr=h1) Report
   > Merging 
[#9494](https://codecov.io/gh/apache/arrow/pull/9494?src=pr=desc) (edbea69) 
into 
[master](https://codecov.io/gh/apache/arrow/commit/bfa99d98e6f0b49aed079d9f99668390f84a933f?el=desc)
 (bfa99d9) will **decrease** coverage by `0.07%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/arrow/pull/9494/graphs/tree.svg?width=650=150=pr=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9494?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#9494  +/-   ##
   ==
   - Coverage   82.49%   82.42%   -0.08% 
   ==
 Files 245  245  
 Lines   5734757646 +299 
   ==
   + Hits4731147517 +206 
   - Misses  1003610129  +93 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/arrow/pull/9494?src=pr=tree) | Coverage 
Δ | |
   |---|---|---|
   | 
[rust/datafusion-examples/examples/flight\_server.rs](https://codecov.io/gh/apache/arrow/pull/9494/diff?src=pr=tree#diff-cnVzdC9kYXRhZnVzaW9uLWV4YW1wbGVzL2V4YW1wbGVzL2ZsaWdodF9zZXJ2ZXIucnM=)
 | `0.00% <ø> (ø)` | |
   | 
[rust/datafusion-examples/examples/simple\_udaf.rs](https://codecov.io/gh/apache/arrow/pull/9494/diff?src=pr=tree#diff-cnVzdC9kYXRhZnVzaW9uLWV4YW1wbGVzL2V4YW1wbGVzL3NpbXBsZV91ZGFmLnJz)
 | `0.00% <ø> (ø)` | |
   | 
[rust/parquet/src/basic.rs](https://codecov.io/gh/apache/arrow/pull/9494/diff?src=pr=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9iYXNpYy5ycw==)
 | `87.22% <0.00%> (-10.04%)` | :arrow_down: |
   | 
[rust/parquet/src/arrow/array\_reader.rs](https://codecov.io/gh/apache/arrow/pull/9494/diff?src=pr=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9hcnJheV9yZWFkZXIucnM=)
 | `77.53% <0.00%> (-0.08%)` | :arrow_down: |
   | 
[rust/parquet/src/schema/parser.rs](https://codecov.io/gh/apache/arrow/pull/9494/diff?src=pr=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9zY2hlbWEvcGFyc2VyLnJz)
 | `90.20% <0.00%> (ø)` | |
   | 
[rust/parquet/src/file/footer.rs](https://codecov.io/gh/apache/arrow/pull/9494/diff?src=pr=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9maWxlL2Zvb3Rlci5ycw==)
 | `96.26% <0.00%> (+0.03%)` | :arrow_up: |
   | 
[rust/parquet/src/file/writer.rs](https://codecov.io/gh/apache/arrow/pull/9494/diff?src=pr=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9maWxlL3dyaXRlci5ycw==)
 | `95.32% <0.00%> (+0.21%)` | :arrow_up: |
   | 
[rust/parquet/src/schema/types.rs](https://codecov.io/gh/apache/arrow/pull/9494/diff?src=pr=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9zY2hlbWEvdHlwZXMucnM=)
 | `89.79% <0.00%> (+0.25%)` | :arrow_up: |
   | 
[rust/parquet/src/file/metadata.rs](https://codecov.io/gh/apache/arrow/pull/9494/diff?src=pr=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9maWxlL21ldGFkYXRhLnJz)
 | `92.96% <0.00%> (+0.74%)` | :arrow_up: |
   | 
[rust/arrow/src/array/equal/utils.rs](https://codecov.io/gh/apache/arrow/pull/9494/diff?src=pr=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvZXF1YWwvdXRpbHMucnM=)
 | `76.47% <0.00%> (+0.98%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/arrow/pull/9494?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/arrow/pull/9494?src=pr=footer). Last 
update 
[bfa99d9...edbea69](https://codecov.io/gh/apache/arrow/pull/9494?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #9483: ARROW-11610: [C++] Download boost from sourceforge instead of bintray

2021-03-07 Thread GitBox


kou commented on pull request #9483:
URL: https://github.com/apache/arrow/pull/9483#issuecomment-792362001


   FYI: We'll drop support for Ubuntu 16.04 LTS soon (2021-04 is EOL of Ubuntu 
16.04 LTS) and Ubuntu 18.04 LTS or later ship CMake 3.10 or later. We can use 
CMake 3.17.5 from EPEL on CentOS 7.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mathyingzhou edited a comment on pull request #8648: ARROW-7906: [C++] [Python] Add ORC write support

2021-03-07 Thread GitBox


mathyingzhou edited a comment on pull request #8648:
URL: https://github.com/apache/arrow/pull/8648#issuecomment-792360387


   @kou Thanks! I think I have already made all the changes in GLib (more 
precisely the Ruby tests) I think I need to make. Not sure why adding MAPs to 
buildable.rb led to a complaint about data_type = builder.value_data_type 
though. If I understand it correctly value_data_type is actually the Type 
struct in type_fwd.h. Not sure why we get “cannot create instance of abstract 
(non-instantiatable) type ‘GArrowDataType’”.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] codecov-io commented on pull request #9645: ARROW-11894: [Rust][DataFusion] Change flight server example to use DataFrame API

2021-03-07 Thread GitBox


codecov-io commented on pull request #9645:
URL: https://github.com/apache/arrow/pull/9645#issuecomment-792361661


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9645?src=pr=h1) Report
   > Merging 
[#9645](https://codecov.io/gh/apache/arrow/pull/9645?src=pr=desc) (b86061b) 
into 
[master](https://codecov.io/gh/apache/arrow/commit/bfa99d98e6f0b49aed079d9f99668390f84a933f?el=desc)
 (bfa99d9) will **decrease** coverage by `0.06%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/arrow/pull/9645/graphs/tree.svg?width=650=150=pr=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9645?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#9645  +/-   ##
   ==
   - Coverage   82.49%   82.43%   -0.07% 
   ==
 Files 245  245  
 Lines   5734757643 +296 
   ==
   + Hits4731147517 +206 
   - Misses  1003610126  +90 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/arrow/pull/9645?src=pr=tree) | Coverage 
Δ | |
   |---|---|---|
   | 
[rust/datafusion/examples/flight\_server.rs](https://codecov.io/gh/apache/arrow/pull/9645/diff?src=pr=tree#diff-cnVzdC9kYXRhZnVzaW9uL2V4YW1wbGVzL2ZsaWdodF9zZXJ2ZXIucnM=)
 | `0.00% <0.00%> (ø)` | |
   | 
[rust/parquet/src/basic.rs](https://codecov.io/gh/apache/arrow/pull/9645/diff?src=pr=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9iYXNpYy5ycw==)
 | `87.22% <0.00%> (-10.04%)` | :arrow_down: |
   | 
[rust/parquet/src/arrow/array\_reader.rs](https://codecov.io/gh/apache/arrow/pull/9645/diff?src=pr=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9hcnJheV9yZWFkZXIucnM=)
 | `77.53% <0.00%> (-0.08%)` | :arrow_down: |
   | 
[rust/parquet/src/schema/parser.rs](https://codecov.io/gh/apache/arrow/pull/9645/diff?src=pr=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9zY2hlbWEvcGFyc2VyLnJz)
 | `90.20% <0.00%> (ø)` | |
   | 
[rust/parquet/src/file/footer.rs](https://codecov.io/gh/apache/arrow/pull/9645/diff?src=pr=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9maWxlL2Zvb3Rlci5ycw==)
 | `96.26% <0.00%> (+0.03%)` | :arrow_up: |
   | 
[rust/parquet/src/file/writer.rs](https://codecov.io/gh/apache/arrow/pull/9645/diff?src=pr=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9maWxlL3dyaXRlci5ycw==)
 | `95.32% <0.00%> (+0.21%)` | :arrow_up: |
   | 
[rust/parquet/src/schema/types.rs](https://codecov.io/gh/apache/arrow/pull/9645/diff?src=pr=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9zY2hlbWEvdHlwZXMucnM=)
 | `89.79% <0.00%> (+0.25%)` | :arrow_up: |
   | 
[rust/parquet/src/file/metadata.rs](https://codecov.io/gh/apache/arrow/pull/9645/diff?src=pr=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9maWxlL21ldGFkYXRhLnJz)
 | `92.96% <0.00%> (+0.74%)` | :arrow_up: |
   | 
[rust/arrow/src/array/equal/utils.rs](https://codecov.io/gh/apache/arrow/pull/9645/diff?src=pr=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvZXF1YWwvdXRpbHMucnM=)
 | `76.47% <0.00%> (+0.98%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/arrow/pull/9645?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/arrow/pull/9645?src=pr=footer). Last 
update 
[bfa99d9...b86061b](https://codecov.io/gh/apache/arrow/pull/9645?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #9483: ARROW-11610: [C++] Download boost from sourceforge instead of bintray

2021-03-07 Thread GitBox


kou commented on pull request #9483:
URL: https://github.com/apache/arrow/pull/9483#issuecomment-792361563


   I like 1. I think that we can require CMake 3.10 or later only when an user 
wants to use bundled Thrift.
   We already requires CMake 3.7 or later only when an user wants to use 
bundled Zstandard.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mathyingzhou edited a comment on pull request #8648: ARROW-7906: [C++] [Python] Add ORC write support

2021-03-07 Thread GitBox


mathyingzhou edited a comment on pull request #8648:
URL: https://github.com/apache/arrow/pull/8648#issuecomment-792360387


   @kou Thanks! I think I have already made all the changes in GLib (more 
precisely the Ruby tests) I think I need to make. Not sure why adding MAPs to 
buildable.rb led to a complaint about data_type = builder.value_data_type 
though.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mathyingzhou commented on pull request #8648: ARROW-7906: [C++] [Python] Add ORC write support

2021-03-07 Thread GitBox


mathyingzhou commented on pull request #8648:
URL: https://github.com/apache/arrow/pull/8648#issuecomment-792360387


   @kou Thanks! I think I have already made all the changes in GLib (more 
precisely the Ruby tests) I think I need to make. Not sure why adding maps to 
buildable.rb led to a complaint about data_type = builder.value_data_type 
though.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #8648: ARROW-7906: [C++] [Python] Add ORC write support

2021-03-07 Thread GitBox


kou commented on pull request #8648:
URL: https://github.com/apache/arrow/pull/8648#issuecomment-792356850


   Sure. I take over the GLib part.
   I'll also reply about the `@rpath` related problem (the `uninitialized 
constant` error is caused by this) on the dev@ mailing list later.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ivanvankov closed pull request #9647: ARROW-10903 [Rust] Implement FromIter>> constructor for FixedSizeBinaryArray

2021-03-07 Thread GitBox


ivanvankov closed pull request #9647:
URL: https://github.com/apache/arrow/pull/9647


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jorgecarleitao commented on pull request #9592: ARROW-11803: [Rust] [Parquet] Support v2 LogicalType

2021-03-07 Thread GitBox


jorgecarleitao commented on pull request #9592:
URL: https://github.com/apache/arrow/pull/9592#issuecomment-792335369


    Super cool! Thanks a lot @nevi-me ! 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nevi-me closed pull request #9592: ARROW-11803: [Rust] [Parquet] Support v2 LogicalType

2021-03-07 Thread GitBox


nevi-me closed pull request #9592:
URL: https://github.com/apache/arrow/pull/9592


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nevi-me commented on pull request #9650: ARROW-11898: [Rust] Pretty print columns

2021-03-07 Thread GitBox


nevi-me commented on pull request #9650:
URL: https://github.com/apache/arrow/pull/9650#issuecomment-792330634


   > FYI I merged #9653 / 
[ARROW-11896](https://issues.apache.org/jira/browse/ARROW-11896) for the Rust 
CI checks which may affect this PR. If you see "Rust / AMD64 Debian 10 Rust 
stable test workspace" failing with a linker error or no logs, rebasing against 
master will hopefully fix the problem
   
   Thakns, I'm rebasing my PRs



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] codecov-io commented on pull request #9647: ARROW-10903 [Rust] Implement FromIter>> constructor for FixedSizeBinaryArray

2021-03-07 Thread GitBox


codecov-io commented on pull request #9647:
URL: https://github.com/apache/arrow/pull/9647#issuecomment-792326554


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9647?src=pr=h1) Report
   > Merging 
[#9647](https://codecov.io/gh/apache/arrow/pull/9647?src=pr=desc) (cb69ca1) 
into 
[master](https://codecov.io/gh/apache/arrow/commit/bfa99d98e6f0b49aed079d9f99668390f84a933f?el=desc)
 (bfa99d9) will **decrease** coverage by `0.00%`.
   > The diff coverage is `82.85%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/arrow/pull/9647/graphs/tree.svg?width=650=150=pr=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9647?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#9647  +/-   ##
   ==
   - Coverage   82.49%   82.49%   -0.01% 
   ==
 Files 245  245  
 Lines   5734757380  +33 
   ==
   + Hits4731147335  +24 
   - Misses  1003610045   +9 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/arrow/pull/9647?src=pr=tree) | Coverage 
Δ | |
   |---|---|---|
   | 
[rust/arrow/src/array/array\_binary.rs](https://codecov.io/gh/apache/arrow/pull/9647/diff?src=pr=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfYmluYXJ5LnJz)
 | `90.35% <80.32%> (-1.47%)` | :arrow_down: |
   | 
[rust/arrow/src/array/transform/mod.rs](https://codecov.io/gh/apache/arrow/pull/9647/diff?src=pr=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvdHJhbnNmb3JtL21vZC5ycw==)
 | `93.21% <100.00%> (+0.02%)` | :arrow_up: |
   | 
[rust/arrow/src/array/equal/utils.rs](https://codecov.io/gh/apache/arrow/pull/9647/diff?src=pr=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvZXF1YWwvdXRpbHMucnM=)
 | `76.47% <0.00%> (+0.98%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/arrow/pull/9647?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/arrow/pull/9647?src=pr=footer). Last 
update 
[bfa99d9...cb69ca1](https://codecov.io/gh/apache/arrow/pull/9647?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] alamb commented on pull request #9592: ARROW-11803: [Rust] [Parquet] Support v2 LogicalType

2021-03-07 Thread GitBox


alamb commented on pull request #9592:
URL: https://github.com/apache/arrow/pull/9592#issuecomment-792325041


   FYI I merged https://github.com/apache/arrow/pull/9653 / 
[ARROW-11896](https://issues.apache.org/jira/browse/ARROW-11896) for the Rust 
CI checks which may affect this PR. If you see "Rust / AMD64 Debian 10 Rust 
stable test workspace" failing  with a linker error or no logs, rebasing 
against master will hopefully fix the problem



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] alamb commented on pull request #9650: ARROW-11898: [Rust] Pretty print columns

2021-03-07 Thread GitBox


alamb commented on pull request #9650:
URL: https://github.com/apache/arrow/pull/9650#issuecomment-792325008


   FYI I merged https://github.com/apache/arrow/pull/9653 / 
[ARROW-11896](https://issues.apache.org/jira/browse/ARROW-11896) for the Rust 
CI checks which may affect this PR. If you see "Rust / AMD64 Debian 10 Rust 
stable test workspace" failing  with a linker error or no logs, rebasing 
against master will hopefully fix the problem



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] alamb commented on pull request #9232: ARROW-10818: [Rust] Implement DecimalType

2021-03-07 Thread GitBox


alamb commented on pull request #9232:
URL: https://github.com/apache/arrow/pull/9232#issuecomment-792324958


   FYI I merged https://github.com/apache/arrow/pull/9653 / 
[ARROW-11896](https://issues.apache.org/jira/browse/ARROW-11896) for the Rust 
CI checks which may affect this PR. If you see "Rust / AMD64 Debian 10 Rust 
stable test workspace" failing  with a linker error or no logs, rebasing 
against master will hopefully fix the problem



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] alamb commented on pull request #9645: ARROW-11894: [Rust][DataFusion] Change flight server example to use DataFrame API

2021-03-07 Thread GitBox


alamb commented on pull request #9645:
URL: https://github.com/apache/arrow/pull/9645#issuecomment-792324979


   FYI I merged https://github.com/apache/arrow/pull/9653 / 
[ARROW-11896](https://issues.apache.org/jira/browse/ARROW-11896) for the Rust 
CI checks which may affect this PR. If you see "Rust / AMD64 Debian 10 Rust 
stable test workspace" failing  with a linker error or no logs, rebasing 
against master will hopefully fix the problem



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] alamb commented on pull request #9602: ARROW-11630: [Rust] Introduce limit option for sort kernel

2021-03-07 Thread GitBox


alamb commented on pull request #9602:
URL: https://github.com/apache/arrow/pull/9602#issuecomment-792324683


   > From our side, I am not sure I agree with copy-pasting code from a crate 
to our own crate. IMO we should instead actively engage with our "providers" by 
contributing upstream.
   
   @jorgecarleitao as you and @Dandandan have elucidated,  I think there are 
tradeoffs to the two approaches. I am happy with either approach in general -- 
I suggested the "include it in arrow approach" in this case as I felt it had 
less overhead for @sundy-li  who has already worked on this PR a great deal, 
but that may not be the best approach either short or long term.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] alamb commented on pull request #9646: ARROW-11895: [Rust][DataFusion] Add support for more column statistics

2021-03-07 Thread GitBox


alamb commented on pull request #9646:
URL: https://github.com/apache/arrow/pull/9646#issuecomment-792324047


   FYI I merged https://github.com/apache/arrow/pull/9653 / 
[ARROW-11896](https://issues.apache.org/jira/browse/ARROW-11896) for the Rust 
CI checks which may affect this PR. If you see "Rust / AMD64 Debian 10 Rust 
stable test workspace" failing  with a linker error or no logs, rebasing 
against master will hopefully fix the problem



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] alamb commented on pull request #9647: ARROW-10903 [Rust] Implement FromIter>> constructor for FixedSizeBinaryArray

2021-03-07 Thread GitBox


alamb commented on pull request #9647:
URL: https://github.com/apache/arrow/pull/9647#issuecomment-792324018


   FYI I merged https://github.com/apache/arrow/pull/9653 / 
[ARROW-11896](https://issues.apache.org/jira/browse/ARROW-11896) for the Rust 
CI checks which may affect this PR. If you see "Rust / AMD64 Debian 10 Rust 
stable test workspace" failing  with a linker error or no logs, rebasing 
against master will hopefully fix the problem



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kiszk commented on pull request #9640: ARROW-11872: [C++] Fix Array validation when Array contains non-CPU buffers

2021-03-07 Thread GitBox


kiszk commented on pull request #9640:
URL: https://github.com/apache/arrow/pull/9640#issuecomment-792324014


   Looks good



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] alamb closed pull request #9651: NONE [Rust] test change to trigger CI hang - TESTING NOT FOR MERGING

2021-03-07 Thread GitBox


alamb closed pull request #9651:
URL: https://github.com/apache/arrow/pull/9651


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] alamb commented on pull request #9651: NONE [Rust] test change to trigger CI hang - TESTING NOT FOR MERGING

2021-03-07 Thread GitBox


alamb commented on pull request #9651:
URL: https://github.com/apache/arrow/pull/9651#issuecomment-792323441


   Real fix in https://github.com/apache/arrow/pull/9653



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] alamb closed pull request #9653: ARROW-11896: [Rust] Disable Debug symbols on CI test builds

2021-03-07 Thread GitBox


alamb closed pull request #9653:
URL: https://github.com/apache/arrow/pull/9653


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] alamb commented on pull request #9653: ARROW-11896: [Rust] Disable Debug symbols on CI test builds - WORK IN PROGRESS

2021-03-07 Thread GitBox


alamb commented on pull request #9653:
URL: https://github.com/apache/arrow/pull/9653#issuecomment-792322711


   I still get backtraces on my linux machine (Ubuntu 20.04). :shipit: !



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ivanvankov commented on a change in pull request #9647: ARROW-10903 [Rust] Implement FromIter>> constructor for FixedSizeBinaryArray

2021-03-07 Thread GitBox


ivanvankov commented on a change in pull request #9647:
URL: https://github.com/apache/arrow/pull/9647#discussion_r589064080



##
File path: rust/arrow/src/array/array_binary.rs
##
@@ -364,62 +365,135 @@ impl FixedSizeBinaryArray {
 self.data.buffers()[0].clone()
 }
 
-#[inline]
-fn value_offset_at(, i: usize) -> i32 {
-self.length * i as i32
-}
-}
+/// Create an array from an iterable argument of sparse byte slices.
+///
+/// # Errors
+///
+/// Returns error if argument has length zero, or sizes of nested slices 
don't match.
+pub fn try_from_sparse_iter(mut iter: T) -> Result
+where
+T: Iterator>,
+U: AsRef<[u8]>,
+{
+let mut len = 0;
+let mut size = 0;
+let mut byte = 0;
+let mut null_buf = MutableBuffer::from_len_zeroed(0);
+let mut buffer = MutableBuffer::from_len_zeroed(0);
+let mut prepend = 0;
+iter.try_for_each(|item| -> Result<(), ArrowError> {
+if byte == 0 {
+null_buf.push(0);
+byte = 8;
+}
+byte -= 1;
+
+if let Some(sliceble) = item {
+let slice = sliceble.as_ref();
+if size != slice.len() {
+if size == 0 {
+size = slice.len();
+} else {
+return Err(ArrowError::InvalidArgumentError(format!(
+"Nested array size mismatch: one is {}, and the 
other is {}",
+size,
+slice.len()
+)));
+}
+}
+bit_util::set_bit(null_buf.as_slice_mut(), len);
+buffer.extend_from_slice(slice);
+} else {
+buffer.extend_zeros(size);
+if size == 0 {
+prepend += 1;
+}
+}
 
-impl From>> for FixedSizeBinaryArray {
-fn from(data: Vec>) -> Self {
-let len = data.len();
-assert!(len > 0);
-let size = data[0].len();
-assert!(data.iter().all(|item| item.len() == size));
-let data = data.into_iter().flatten().collect::>();
-let array_data = ArrayData::builder(DataType::FixedSizeBinary(size as 
i32))
-.len(len)
-.add_buffer(Buffer::from())
-.build();
-FixedSizeBinaryArray::from(array_data)
-}
-}
+len += 1;
 
-impl From>>> for FixedSizeBinaryArray {
-fn from(data: Vec>>) -> Self {
-let len = data.len();
-assert!(len > 0);
-// try to estimate the size. This may not be possible no entry is 
valid => panic
-let size = data.iter().filter_map(|e| 
e.as_ref()).next().unwrap().len();
-assert!(data
-.iter()
-.filter_map(|e| e.as_ref())
-.all(|item| item.len() == size));
-
-let num_bytes = bit_util::ceil(len, 8);
-let mut null_buf = MutableBuffer::from_len_zeroed(num_bytes);
-let null_slice = null_buf.as_slice_mut();
-
-data.iter().enumerate().for_each(|(i, entry)| {
-if entry.is_some() {
-bit_util::set_bit(null_slice, i);
+Ok(())
+})?;
+
+if len == 0 {
+return Err(ArrowError::InvalidArgumentError(
+"Input iterable argument has no data".to_owned(),
+));
+}
+
+if prepend > 0 {
+let extend_size = size * prepend;
+let copy_size = buffer.len();
+buffer.resize(copy_size + extend_size, 0);
+unsafe {
+let src = buffer.as_ptr();
+let dst = buffer.as_mut_ptr().add(extend_size);
+std::ptr::copy(src, dst, copy_size);
+buffer.as_mut_ptr().write_bytes(0, extend_size);
 }
-});
+}
 
-let data = data
-.into_iter()
-.flat_map(|e| e.unwrap_or_else(|| vec![0; size]))
-.collect::>();
-let data = ArrayData::new(
+let array_data = ArrayData::new(
 DataType::FixedSizeBinary(size as i32),
 len,
 None,
 Some(null_buf.into()),
 0,
-vec![Buffer::from()],
+vec![buffer.into()],
 vec![],
 );
-FixedSizeBinaryArray::from(Arc::new(data))
+Ok(FixedSizeBinaryArray::from(Arc::new(array_data)))
+}
+
+/// Create an array from an iterable argument of byte slices.
+///
+/// # Errors
+///
+/// Returns error if argument has length zero, or sizes of nested slices 
don't match.
+pub fn try_from_iter(mut iter: T) -> Result
+where
+T: Iterator,
+U: AsRef<[u8]>,
+{
+let mut len = 0;
+let mut size = 0;
+let mut buffer = MutableBuffer::from_len_zeroed(0);
+ 

[GitHub] [arrow] nealrichardson commented on pull request #9483: ARROW-11610: [C++] Download boost from sourceforge instead of bintray

2021-03-07 Thread GitBox


nealrichardson commented on pull request #9483:
URL: https://github.com/apache/arrow/pull/9483#issuecomment-792322123


   I tried bumping boost to the latest (1.75.0), and that does successfully 
download from sourceforge. However, the build fails because of Thrift 0.12's 
use of deprecated boost endian headers (cf. 
https://issues.apache.org/jira/browse/THRIFT-4836), which were [removed in 
boost 1.73.0](https://github.com/boostorg/endian/issues/45). This was 
apparently [fixed in Thrift 
0.13](https://issues.apache.org/jira/browse/THRIFT-4861).
   
   I see three options:
   
   1. [Upgrade to thrift 
0.13](https://issues.apache.org/jira/browse/ARROW-8049) to allow newer boost. 
This was stalled before because thrift 0.13 apparently requires newer cmake 
than we do. https://issues.apache.org/jira/browse/THRIFT-5137 is the issue we 
created about that. It looks like an easy fix, which we could then apply as a 
patch step in our cmake build to allow 0.13.
   2. Apply the patch from THRIFT-4861 in our build_thrift cmake macro and keep 
going with thrift 0.12 while allowing us to upgrade to newer boost.
   3. Bump boost only to 1.72, the last version before the deprecated headers 
were removed, and don't bother with thrift. 
   
   IMO option 1 is best; I think thrift 0.13 also uses even less boost than 
0.12, so we could prune our boost bundle further. And I suspect it only 
requires a very small change. Option 3 is the least effort but just defers the 
maintenance burden.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] alamb commented on pull request #9653: ARROW-11896: [Rust] Disable Debug symbols on CI test builds - WORK IN PROGRESS

2021-03-07 Thread GitBox


alamb commented on pull request #9653:
URL: https://github.com/apache/arrow/pull/9653#issuecomment-792321059


   I plan to run this on a linux machine (to confirm the backtrace behavior and 
then merge it in)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] alamb commented on pull request #9653: ARROW-11896: [Rust] Disable Debug symbols on CI test builds - WORK IN PROGRESS

2021-03-07 Thread GitBox


alamb commented on pull request #9653:
URL: https://github.com/apache/arrow/pull/9653#issuecomment-792320959


   > We've already observed from our local repro attempts that there wasn't any 
specific PR that caused the issue. It could be that some dependency updates 
suddenly caused us to hit the memory limits on CI, as you're exploring.
   
   I think this is a likely theory but I don't have any specific data to prove 
one way or the other
   
   > Curious, when the CI fails, we will no longer have the trace, right? Isn't 
that a concern whenever we want to share logs e.g. on PRs, mailing lists, etc?
   
   I think we will still get backtraces even without "debug" symbols (the debug 
symbols are used by gdb / lldb to be able to interpret core dumps / map local 
variables to memory locations, etc). 
   
   To confirm this theory I ran the following (on my mac, will try on linux 
shortly) and the trace is still present
   ```
   (arrow_dev) alamb@MacBook-Pro:~/Software/arrow2/rust$ RUST_BACKTRACE=1 
RUSTFLAGS="-C debuginfo=0" cargo test -p arrow -- test_null_list_primitive
   Finished test [unoptimized + debuginfo] target(s) in 0.10s
Running target/debug/deps/arrow-99caf14857691f1e
   
   running 1 test
   test array::array::tests::test_null_list_primitive ... FAILED
   
   failures:
   
    array::array::tests::test_null_list_primitive stdout 
   thread 'array::array::tests::test_null_list_primitive' panicked at 'test 
error', arrow/src/array/array.rs:621:9
   stack backtrace:
  0: std::panicking::begin_panic
  1: arrow::array::array::tests::test_null_list_primitive
  2: arrow::array::array::tests::test_null_list_primitive::{{closure}}
  3: core::ops::function::FnOnce::call_once
  4: core::ops::function::FnOnce::call_once
at 
/rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b/library/core/src/ops/function.rs:227:5
   note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose 
backtrace.
   
   
   failures:
   array::array::tests::test_null_list_primitive
   
   test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 667 filtered 
out; finished in 0.14s
   
   error: test failed, to rerun pass '-p arrow --lib'
   ``` 
   
   > A compromise could be to disable debug symbols only on the Debian Rust 
stable runs, as the other platforms seem fine.
   
   Yeah -- in fact this PR doesn't set the debug flags for Windows or Mac (I 
believe the Mac builders are larger -- 15GB of memory on github, and windows 
debugging symbols work significantly different)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nevi-me commented on pull request #9653: ARROW-11896: [Rust] Disable Debug symbols on CI test builds - WORK IN PROGRESS

2021-03-07 Thread GitBox


nevi-me commented on pull request #9653:
URL: https://github.com/apache/arrow/pull/9653#issuecomment-792312016


   > Curious, when the CI fails, we will no longer have the trace, right? Isn't 
that a concern whenever we want to share logs e.g. on PRs, mailing lists, etc?
   
   A compromise could be to disable debug symbols only on the Debian Rust 
stable runs, as the other platforms seem fine.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jbrockmendel commented on pull request #8957: ARROW-10768: [Python] pass ndim to pandas make_block

2021-03-07 Thread GitBox


jbrockmendel commented on pull request #8957:
URL: https://github.com/apache/arrow/pull/8957#issuecomment-792311293


   closing, as pandas can now deal with this internally.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jbrockmendel closed pull request #8957: ARROW-10768: [Python] pass ndim to pandas make_block

2021-03-07 Thread GitBox


jbrockmendel closed pull request #8957:
URL: https://github.com/apache/arrow/pull/8957


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9647: ARROW-10903 [Rust] Implement FromIter>> constructor for FixedSizeBinaryArray

2021-03-07 Thread GitBox


jorgecarleitao commented on a change in pull request #9647:
URL: https://github.com/apache/arrow/pull/9647#discussion_r589056422



##
File path: rust/arrow/src/array/array_binary.rs
##
@@ -364,62 +365,135 @@ impl FixedSizeBinaryArray {
 self.data.buffers()[0].clone()
 }
 
-#[inline]
-fn value_offset_at(, i: usize) -> i32 {
-self.length * i as i32
-}
-}
+/// Create an array from an iterable argument of sparse byte slices.
+///
+/// # Errors
+///
+/// Returns error if argument has length zero, or sizes of nested slices 
don't match.
+pub fn try_from_sparse_iter(mut iter: T) -> Result
+where
+T: Iterator>,
+U: AsRef<[u8]>,
+{
+let mut len = 0;
+let mut size = 0;
+let mut byte = 0;
+let mut null_buf = MutableBuffer::from_len_zeroed(0);
+let mut buffer = MutableBuffer::from_len_zeroed(0);
+let mut prepend = 0;
+iter.try_for_each(|item| -> Result<(), ArrowError> {
+if byte == 0 {
+null_buf.push(0);
+byte = 8;
+}
+byte -= 1;
+
+if let Some(sliceble) = item {
+let slice = sliceble.as_ref();
+if size != slice.len() {
+if size == 0 {
+size = slice.len();
+} else {
+return Err(ArrowError::InvalidArgumentError(format!(
+"Nested array size mismatch: one is {}, and the 
other is {}",
+size,
+slice.len()
+)));
+}
+}
+bit_util::set_bit(null_buf.as_slice_mut(), len);
+buffer.extend_from_slice(slice);
+} else {
+buffer.extend_zeros(size);
+if size == 0 {
+prepend += 1;
+}
+}
 
-impl From>> for FixedSizeBinaryArray {
-fn from(data: Vec>) -> Self {
-let len = data.len();
-assert!(len > 0);
-let size = data[0].len();
-assert!(data.iter().all(|item| item.len() == size));
-let data = data.into_iter().flatten().collect::>();
-let array_data = ArrayData::builder(DataType::FixedSizeBinary(size as 
i32))
-.len(len)
-.add_buffer(Buffer::from())
-.build();
-FixedSizeBinaryArray::from(array_data)
-}
-}
+len += 1;
 
-impl From>>> for FixedSizeBinaryArray {
-fn from(data: Vec>>) -> Self {
-let len = data.len();
-assert!(len > 0);
-// try to estimate the size. This may not be possible no entry is 
valid => panic
-let size = data.iter().filter_map(|e| 
e.as_ref()).next().unwrap().len();
-assert!(data
-.iter()
-.filter_map(|e| e.as_ref())
-.all(|item| item.len() == size));
-
-let num_bytes = bit_util::ceil(len, 8);
-let mut null_buf = MutableBuffer::from_len_zeroed(num_bytes);
-let null_slice = null_buf.as_slice_mut();
-
-data.iter().enumerate().for_each(|(i, entry)| {
-if entry.is_some() {
-bit_util::set_bit(null_slice, i);
+Ok(())
+})?;
+
+if len == 0 {
+return Err(ArrowError::InvalidArgumentError(
+"Input iterable argument has no data".to_owned(),
+));
+}
+
+if prepend > 0 {
+let extend_size = size * prepend;
+let copy_size = buffer.len();
+buffer.resize(copy_size + extend_size, 0);
+unsafe {
+let src = buffer.as_ptr();
+let dst = buffer.as_mut_ptr().add(extend_size);
+std::ptr::copy(src, dst, copy_size);
+buffer.as_mut_ptr().write_bytes(0, extend_size);
 }
-});
+}
 
-let data = data
-.into_iter()
-.flat_map(|e| e.unwrap_or_else(|| vec![0; size]))
-.collect::>();
-let data = ArrayData::new(
+let array_data = ArrayData::new(
 DataType::FixedSizeBinary(size as i32),
 len,
 None,
 Some(null_buf.into()),
 0,
-vec![Buffer::from()],
+vec![buffer.into()],
 vec![],
 );
-FixedSizeBinaryArray::from(Arc::new(data))
+Ok(FixedSizeBinaryArray::from(Arc::new(array_data)))
+}
+
+/// Create an array from an iterable argument of byte slices.
+///
+/// # Errors
+///
+/// Returns error if argument has length zero, or sizes of nested slices 
don't match.
+pub fn try_from_iter(mut iter: T) -> Result
+where
+T: Iterator,
+U: AsRef<[u8]>,
+{
+let mut len = 0;
+let mut size = 0;
+let mut buffer = MutableBuffer::from_len_zeroed(0);
+ 

[GitHub] [arrow] jorgecarleitao commented on pull request #9653: ARROW-11896: [Rust] Disable Debug symbols on CI test builds - WORK IN PROGRESS

2021-03-07 Thread GitBox


jorgecarleitao commented on pull request #9653:
URL: https://github.com/apache/arrow/pull/9653#issuecomment-792306123


   Curious, when the CI fails, we will no longer have the trace, right? Isn't 
that a concern whenever we want to share logs e.g. on PRs, mailing lists, etc?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jorgecarleitao commented on pull request #9602: ARROW-11630: [Rust] Introduce limit option for sort kernel

2021-03-07 Thread GitBox


jorgecarleitao commented on pull request #9602:
URL: https://github.com/apache/arrow/pull/9602#issuecomment-792303834


   Thanks for that, @Dandandan, all great points  , to which I generally agree 
with.
   
   > It would be easier to review the code / run Miri checks, etc. when it was 
included in the PR
   
   I agree. My concern atm is that our own crate does not pass miri checks (our 
miri checks are `cargo miri test || true`), thus, any potential soundness 
issues new code may have cannot be identified by our own CI. I only recently 
was able to have [a version of a arrow 
crate](https://github.com/jorgecarleitao/arrow2) to pass miri checks, and I am 
still [deactivating some 
checks](https://github.com/jorgecarleitao/arrow2/blob/main/.github/workflows/test.yml#L160)
 and investigating. My hypothesis is that it is very difficult to write `safe` 
code in this crate's design and thus my state of mind is that our best bet is 
to push the soundness checks about specific parts of the code to places where 
those can be done in a more contained environment (i.e. without all the 
transmutes and the like that we do here). We should still of course try to fix 
our own miri checks. xD
   
   > Build times become larger compared to having the same code in the project 
(as the crate will be downloaded / compiled separately). Of course one extra 
small dependency won't matter that much, but it accumulates with lots of 
dependencies.
   
   I agree. And I think that this is already a problem for DataFusion and 
parquet. I do get the feeling that the culprits there are crossbeam, tokio, 
tower and the like. These small crates seem to be downloaded and compiled in 
parallel and thus have a small impact when compared to the big beasts that 
block compilation (i.e. we have unbalanced-sized tasks). In arrows' case, the 
largest is flatbuffer I think, which we could feature-gate under e.g. `io_ipc` 
in case people do not need that. We can't get away with these during testing, 
but I think it is by design; if we want, IMO we need a new compilation unit 
(e.g. a crate).
   
   > Has some risk that the crated will become unmaintained at some point, but 
causes a bug / security problem, etc. in Arrow. Or we would like to improve it, 
but a crate maintainer might be unresponsive. A recent example of this I saw in 
the memmap crate which now has a fork memmap2 which is quite hard to find as 
the docs / repo of memmap are still online.
   
   I agree. That IMO should be sufficient for us to bring the crate to arrow.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] codecov-io edited a comment on pull request #9653: ARROW-11896: [Rust] Disable Debug symbols on CI test builds - WORK IN PROGRESS

2021-03-07 Thread GitBox


codecov-io edited a comment on pull request #9653:
URL: https://github.com/apache/arrow/pull/9653#issuecomment-792286831


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9653?src=pr=h1) Report
   > Merging 
[#9653](https://codecov.io/gh/apache/arrow/pull/9653?src=pr=desc) (7bd587e) 
into 
[master](https://codecov.io/gh/apache/arrow/commit/dec5ab9677a0dfb9e324a57ccf5550217a2117be?el=desc)
 (dec5ab9) will **decrease** coverage by `0.01%`.
   > The diff coverage is `56.25%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/arrow/pull/9653/graphs/tree.svg?width=650=150=pr=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9653?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#9653  +/-   ##
   ==
   - Coverage   82.51%   82.50%   -0.02% 
   ==
 Files 245  245  
 Lines   5732957347  +18 
   ==
   + Hits4730647312   +6 
   - Misses  1002310035  +12 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/arrow/pull/9653?src=pr=tree) | Coverage 
Δ | |
   |---|---|---|
   | 
[rust/arrow/src/array/array\_list.rs](https://codecov.io/gh/apache/arrow/pull/9653/diff?src=pr=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfbGlzdC5ycw==)
 | `92.88% <ø> (-0.07%)` | :arrow_down: |
   | 
[rust/arrow/src/datatypes/native.rs](https://codecov.io/gh/apache/arrow/pull/9653/diff?src=pr=tree#diff-cnVzdC9hcnJvdy9zcmMvZGF0YXR5cGVzL25hdGl2ZS5ycw==)
 | `76.59% <22.22%> (-12.88%)` | :arrow_down: |
   | 
[rust/arrow/src/array/array\_binary.rs](https://codecov.io/gh/apache/arrow/pull/9653/diff?src=pr=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfYmluYXJ5LnJz)
 | `91.82% <100.00%> (ø)` | |
   | 
[rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/9653/diff?src=pr=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz)
 | `95.00% <100.00%> (+0.04%)` | :arrow_up: |
   | 
[rust/arrow/src/array/array\_string.rs](https://codecov.io/gh/apache/arrow/pull/9653/diff?src=pr=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfc3RyaW5nLnJz)
 | `96.06% <100.00%> (ø)` | |
   | 
[rust/datafusion/src/physical\_plan/merge.rs](https://codecov.io/gh/apache/arrow/pull/9653/diff?src=pr=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL21lcmdlLnJz)
 | `76.56% <100.00%> (ø)` | |
   | 
[rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9653/diff?src=pr=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=)
 | `95.05% <0.00%> (+0.19%)` | :arrow_up: |
   | 
[rust/arrow/src/array/equal/utils.rs](https://codecov.io/gh/apache/arrow/pull/9653/diff?src=pr=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvZXF1YWwvdXRpbHMucnM=)
 | `76.47% <0.00%> (+0.98%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/arrow/pull/9653?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/arrow/pull/9653?src=pr=footer). Last 
update 
[2585f7c...7bd587e](https://codecov.io/gh/apache/arrow/pull/9653?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nevi-me commented on pull request #9653: ARROW-11896: [Rust] Disable Debug symbols on CI test builds - WORK IN PROGRESS

2021-03-07 Thread GitBox


nevi-me commented on pull request #9653:
URL: https://github.com/apache/arrow/pull/9653#issuecomment-792300905


   @alamb this seems to have worked. I'm triggering the jobs to run again, if 
it passes again; it's a   from me.
   
   We've already observed from our local repro attempts that there wasn't any 
specific PR that caused the issue. It could be that some dependency updates 
suddenly caused us to hit the memory limits on CI, as you're exploring.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ivanvankov commented on pull request #9647: ARROW-10903 [Rust] Implement FromIter>> constructor for FixedSizeBinaryArray

2021-03-07 Thread GitBox


ivanvankov commented on pull request #9647:
URL: https://github.com/apache/arrow/pull/9647#issuecomment-792297220


   > I also think so, and also because no `unsafe` would be used. Our 
guidelines about `unsafe` is that there must be good justification for it, and 
I am trying to understand what is the justification exactly.
   > 
   > E.g. since we have to validate that every entries' size are equal, 
couldn't we do it directly? e.g. something like this:
   > 
   > ```rust
   >  Some(v) => {  // item is not null
   > let bytes = *v;
   > if let Some(size) = size {
   > if size != bytes.len() {
   > return Err(ArrowError::...);
   > }
   > } else {
   > // first time we see a len
   > size = Some(bytes.len());
   > self.values
   > .extend_constant(![0; bytes.len() * 
current_validity]);
   > };
   > self.values.extend_from_slice(bytes);
   > self.validity.push(true);
   > ```
   > 
   > where `current_validity` tracks the number of nones up to when we get a 
size, and `size: Option` represents the current size (if known).
   
   You are actually absolutely right, no `unsafe` needed. Thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ivanvankov commented on a change in pull request #9647: ARROW-10903 [Rust] Implement FromIter>> constructor for FixedSizeBinaryArray

2021-03-07 Thread GitBox


ivanvankov commented on a change in pull request #9647:
URL: https://github.com/apache/arrow/pull/9647#discussion_r589046553



##
File path: rust/arrow/src/array/array_binary.rs
##
@@ -364,62 +365,135 @@ impl FixedSizeBinaryArray {
 self.data.buffers()[0].clone()
 }
 
-#[inline]
-fn value_offset_at(, i: usize) -> i32 {
-self.length * i as i32
-}
-}
+/// Create an array from an iterable argument of sparse byte slices.
+///
+/// # Errors
+///
+/// Returns error if argument has length zero, or sizes of nested slices 
don't match.
+pub fn try_from_sparse_iter(mut iter: T) -> Result
+where
+T: Iterator>,
+U: AsRef<[u8]>,
+{
+let mut len = 0;
+let mut size = 0;
+let mut byte = 0;
+let mut null_buf = MutableBuffer::from_len_zeroed(0);
+let mut buffer = MutableBuffer::from_len_zeroed(0);
+let mut prepend = 0;
+iter.try_for_each(|item| -> Result<(), ArrowError> {
+if byte == 0 {
+null_buf.push(0);
+byte = 8;
+}
+byte -= 1;
+
+if let Some(sliceble) = item {
+let slice = sliceble.as_ref();
+if size != slice.len() {
+if size == 0 {
+size = slice.len();
+} else {
+return Err(ArrowError::InvalidArgumentError(format!(
+"Nested array size mismatch: one is {}, and the 
other is {}",
+size,
+slice.len()
+)));
+}
+}
+bit_util::set_bit(null_buf.as_slice_mut(), len);
+buffer.extend_from_slice(slice);
+} else {
+buffer.extend_zeros(size);
+if size == 0 {
+prepend += 1;
+}
+}
 
-impl From>> for FixedSizeBinaryArray {
-fn from(data: Vec>) -> Self {
-let len = data.len();
-assert!(len > 0);
-let size = data[0].len();
-assert!(data.iter().all(|item| item.len() == size));
-let data = data.into_iter().flatten().collect::>();
-let array_data = ArrayData::builder(DataType::FixedSizeBinary(size as 
i32))
-.len(len)
-.add_buffer(Buffer::from())
-.build();
-FixedSizeBinaryArray::from(array_data)
-}
-}
+len += 1;
 
-impl From>>> for FixedSizeBinaryArray {
-fn from(data: Vec>>) -> Self {
-let len = data.len();
-assert!(len > 0);
-// try to estimate the size. This may not be possible no entry is 
valid => panic
-let size = data.iter().filter_map(|e| 
e.as_ref()).next().unwrap().len();
-assert!(data
-.iter()
-.filter_map(|e| e.as_ref())
-.all(|item| item.len() == size));
-
-let num_bytes = bit_util::ceil(len, 8);
-let mut null_buf = MutableBuffer::from_len_zeroed(num_bytes);
-let null_slice = null_buf.as_slice_mut();
-
-data.iter().enumerate().for_each(|(i, entry)| {
-if entry.is_some() {
-bit_util::set_bit(null_slice, i);
+Ok(())
+})?;
+
+if len == 0 {
+return Err(ArrowError::InvalidArgumentError(
+"Input iterable argument has no data".to_owned(),
+));
+}
+
+if prepend > 0 {
+let extend_size = size * prepend;
+let copy_size = buffer.len();
+buffer.resize(copy_size + extend_size, 0);
+unsafe {
+let src = buffer.as_ptr();
+let dst = buffer.as_mut_ptr().add(extend_size);
+std::ptr::copy(src, dst, copy_size);
+buffer.as_mut_ptr().write_bytes(0, extend_size);
 }
-});
+}
 
-let data = data
-.into_iter()
-.flat_map(|e| e.unwrap_or_else(|| vec![0; size]))
-.collect::>();
-let data = ArrayData::new(
+let array_data = ArrayData::new(
 DataType::FixedSizeBinary(size as i32),
 len,
 None,
 Some(null_buf.into()),
 0,
-vec![Buffer::from()],
+vec![buffer.into()],
 vec![],
 );
-FixedSizeBinaryArray::from(Arc::new(data))
+Ok(FixedSizeBinaryArray::from(Arc::new(array_data)))
+}
+
+/// Create an array from an iterable argument of byte slices.
+///
+/// # Errors
+///
+/// Returns error if argument has length zero, or sizes of nested slices 
don't match.
+pub fn try_from_iter(mut iter: T) -> Result
+where
+T: Iterator,
+U: AsRef<[u8]>,
+{
+let mut len = 0;
+let mut size = 0;
+let mut buffer = MutableBuffer::from_len_zeroed(0);
+ 

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9647: ARROW-10903 [Rust] Implement FromIter>> constructor for FixedSizeBinaryArray

2021-03-07 Thread GitBox


jorgecarleitao commented on a change in pull request #9647:
URL: https://github.com/apache/arrow/pull/9647#discussion_r589038582



##
File path: rust/arrow/src/array/array_binary.rs
##
@@ -364,62 +365,135 @@ impl FixedSizeBinaryArray {
 self.data.buffers()[0].clone()
 }
 
-#[inline]
-fn value_offset_at(, i: usize) -> i32 {
-self.length * i as i32
-}
-}
+/// Create an array from an iterable argument of sparse byte slices.
+///
+/// # Errors
+///
+/// Returns error if argument has length zero, or sizes of nested slices 
don't match.
+pub fn try_from_sparse_iter(mut iter: T) -> Result
+where
+T: Iterator>,
+U: AsRef<[u8]>,
+{
+let mut len = 0;
+let mut size = 0;
+let mut byte = 0;
+let mut null_buf = MutableBuffer::from_len_zeroed(0);
+let mut buffer = MutableBuffer::from_len_zeroed(0);
+let mut prepend = 0;
+iter.try_for_each(|item| -> Result<(), ArrowError> {
+if byte == 0 {
+null_buf.push(0);
+byte = 8;
+}
+byte -= 1;
+
+if let Some(sliceble) = item {
+let slice = sliceble.as_ref();
+if size != slice.len() {
+if size == 0 {
+size = slice.len();
+} else {
+return Err(ArrowError::InvalidArgumentError(format!(
+"Nested array size mismatch: one is {}, and the 
other is {}",
+size,
+slice.len()
+)));
+}
+}
+bit_util::set_bit(null_buf.as_slice_mut(), len);
+buffer.extend_from_slice(slice);
+} else {
+buffer.extend_zeros(size);
+if size == 0 {
+prepend += 1;
+}
+}
 
-impl From>> for FixedSizeBinaryArray {
-fn from(data: Vec>) -> Self {
-let len = data.len();
-assert!(len > 0);
-let size = data[0].len();
-assert!(data.iter().all(|item| item.len() == size));
-let data = data.into_iter().flatten().collect::>();
-let array_data = ArrayData::builder(DataType::FixedSizeBinary(size as 
i32))
-.len(len)
-.add_buffer(Buffer::from())
-.build();
-FixedSizeBinaryArray::from(array_data)
-}
-}
+len += 1;
 
-impl From>>> for FixedSizeBinaryArray {
-fn from(data: Vec>>) -> Self {
-let len = data.len();
-assert!(len > 0);
-// try to estimate the size. This may not be possible no entry is 
valid => panic
-let size = data.iter().filter_map(|e| 
e.as_ref()).next().unwrap().len();
-assert!(data
-.iter()
-.filter_map(|e| e.as_ref())
-.all(|item| item.len() == size));
-
-let num_bytes = bit_util::ceil(len, 8);
-let mut null_buf = MutableBuffer::from_len_zeroed(num_bytes);
-let null_slice = null_buf.as_slice_mut();
-
-data.iter().enumerate().for_each(|(i, entry)| {
-if entry.is_some() {
-bit_util::set_bit(null_slice, i);
+Ok(())
+})?;
+
+if len == 0 {
+return Err(ArrowError::InvalidArgumentError(
+"Input iterable argument has no data".to_owned(),
+));
+}
+
+if prepend > 0 {
+let extend_size = size * prepend;
+let copy_size = buffer.len();
+buffer.resize(copy_size + extend_size, 0);
+unsafe {
+let src = buffer.as_ptr();
+let dst = buffer.as_mut_ptr().add(extend_size);
+std::ptr::copy(src, dst, copy_size);
+buffer.as_mut_ptr().write_bytes(0, extend_size);
 }
-});
+}
 
-let data = data
-.into_iter()
-.flat_map(|e| e.unwrap_or_else(|| vec![0; size]))
-.collect::>();
-let data = ArrayData::new(
+let array_data = ArrayData::new(
 DataType::FixedSizeBinary(size as i32),
 len,
 None,
 Some(null_buf.into()),
 0,
-vec![Buffer::from()],
+vec![buffer.into()],
 vec![],
 );
-FixedSizeBinaryArray::from(Arc::new(data))
+Ok(FixedSizeBinaryArray::from(Arc::new(array_data)))
+}
+
+/// Create an array from an iterable argument of byte slices.
+///
+/// # Errors
+///
+/// Returns error if argument has length zero, or sizes of nested slices 
don't match.
+pub fn try_from_iter(mut iter: T) -> Result
+where
+T: Iterator,
+U: AsRef<[u8]>,
+{
+let mut len = 0;
+let mut size = 0;
+let mut buffer = MutableBuffer::from_len_zeroed(0);
+ 

[GitHub] [arrow] codecov-io commented on pull request #9653: ARROW-11896: [Rust] Disable Debug symbols on CI test builds - WORK IN PROGRESS

2021-03-07 Thread GitBox


codecov-io commented on pull request #9653:
URL: https://github.com/apache/arrow/pull/9653#issuecomment-792286831


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9653?src=pr=h1) Report
   > Merging 
[#9653](https://codecov.io/gh/apache/arrow/pull/9653?src=pr=desc) (7bd587e) 
into 
[master](https://codecov.io/gh/apache/arrow/commit/dec5ab9677a0dfb9e324a57ccf5550217a2117be?el=desc)
 (dec5ab9) will **decrease** coverage by `0.01%`.
   > The diff coverage is `56.25%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/arrow/pull/9653/graphs/tree.svg?width=650=150=pr=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9653?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#9653  +/-   ##
   ==
   - Coverage   82.51%   82.50%   -0.02% 
   ==
 Files 245  245  
 Lines   5732957347  +18 
   ==
   + Hits4730647312   +6 
   - Misses  1002310035  +12 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/arrow/pull/9653?src=pr=tree) | Coverage 
Δ | |
   |---|---|---|
   | 
[rust/arrow/src/array/array\_list.rs](https://codecov.io/gh/apache/arrow/pull/9653/diff?src=pr=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfbGlzdC5ycw==)
 | `92.88% <ø> (-0.07%)` | :arrow_down: |
   | 
[rust/arrow/src/datatypes/native.rs](https://codecov.io/gh/apache/arrow/pull/9653/diff?src=pr=tree#diff-cnVzdC9hcnJvdy9zcmMvZGF0YXR5cGVzL25hdGl2ZS5ycw==)
 | `76.59% <22.22%> (-12.88%)` | :arrow_down: |
   | 
[rust/arrow/src/array/array\_binary.rs](https://codecov.io/gh/apache/arrow/pull/9653/diff?src=pr=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfYmluYXJ5LnJz)
 | `91.82% <100.00%> (ø)` | |
   | 
[rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/9653/diff?src=pr=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz)
 | `95.00% <100.00%> (+0.04%)` | :arrow_up: |
   | 
[rust/arrow/src/array/array\_string.rs](https://codecov.io/gh/apache/arrow/pull/9653/diff?src=pr=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfc3RyaW5nLnJz)
 | `96.06% <100.00%> (ø)` | |
   | 
[rust/datafusion/src/physical\_plan/merge.rs](https://codecov.io/gh/apache/arrow/pull/9653/diff?src=pr=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL21lcmdlLnJz)
 | `76.56% <100.00%> (ø)` | |
   | 
[rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9653/diff?src=pr=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=)
 | `95.05% <0.00%> (+0.19%)` | :arrow_up: |
   | 
[rust/arrow/src/array/equal/utils.rs](https://codecov.io/gh/apache/arrow/pull/9653/diff?src=pr=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvZXF1YWwvdXRpbHMucnM=)
 | `76.47% <0.00%> (+0.98%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/arrow/pull/9653?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/arrow/pull/9653?src=pr=footer). Last 
update 
[2585f7c...7bd587e](https://codecov.io/gh/apache/arrow/pull/9653?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] Dandandan commented on pull request #9602: ARROW-11630: [Rust] Introduce limit option for sort kernel

2021-03-07 Thread GitBox


Dandandan commented on pull request #9602:
URL: https://github.com/apache/arrow/pull/9602#issuecomment-792286475


   @jorgecarleitao 
   
   I'm perfectly fine with that approach too.
   
   I do think that depending on crates that are useful instead of building our 
own or even duplicating code is what we should do in general, and indeed will 
benefit others in the Rust community. 
   
   My reasoning behind my suggestion above was
   
   * The crate has no currently no users and (I think) only 1 maintainer.
   * The code size is small
   * The PR author is the same as the maintainer of the crate (and I guess it 
was created for this use case for now?)
   * It would be easier to review the code / run Miri checks, etc. when it was 
included in the PR
   
   Of course, there might be some other parties that would be interested in 
`partial_sort` or maybe it will at some point make it to the standard library.
   
   Also dependencies have *some* costs, some of them being:
   
   * Has some risk that the crated will become unmaintained at some point, but 
causes a bug / security problem, etc. in Arrow. Or we would like to improve it, 
but a crate maintainer might be unresponsive. A recent example of this I saw in 
the `memmap` crate which now has a fork `memmap2` which is quite hard to find 
as the docs / repo of `memmap` are still online.
   * Non-semver breaking changes can not be caught in PRs
   * Build times become larger compared to having the same code in the project 
(as the crate will be downloaded / compiled separately). Of course one extra 
small dependency won't matter that much, but it accumulates with lots of 
dependencies.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ivanvankov commented on pull request #9647: ARROW-10903 [Rust] Implement FromIter>> constructor for FixedSizeBinaryArray

2021-03-07 Thread GitBox


ivanvankov commented on pull request #9647:
URL: https://github.com/apache/arrow/pull/9647#issuecomment-792285533


   Hi @jorgecarleitao, and thank you for the extended review and explanations.
   Now, regarding comments you have:
   
   - It does actually push just one byte per each 8 items. But here I've 
noticed a mistake :-) 
   `null_buf.push(0)` actually pushes 4 bytes, so I'll change it to 
`null_buf.push(0u8)` to fix this. The invariants are still held though.
   
   - Regarding invariants I don't see anything that breaks them in my 
implementation. If you see anything please point to it.
   
   - The part with `unsafe` is for the case when iterator returns `None` as 
first item. Since we need to add to buffer a sequence of zeros of length `size` 
we need to know `size` to do that. But we still don't know `size` value, we 
need to get to the first `Some` to determine it. So, I only save count of 
`None` elements until iterator returns first `Some`, and don't add to buffer 
anything until then. When iterator stops we need to add `size * number of None` 
zeros in the beginning of buffer, and for that the section with `usafe` is 
added, just to copy from one position to other and put zeros. So, I need to 
prepend data, not to append, that's why `MutableBuffer::extend_from_slice` 
can't be used. 
   I think, for more performance it would be better to prepend immediately 
after we determined `size`, not after iterator finished.
   
   - I'll add examples to documentation section for the functions.
   
   Feel free to add more remarks if you have any. Thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #9653: ARROW-11896: [Rust] Disable Debug symbols on CI test builds

2021-03-07 Thread GitBox


github-actions[bot] commented on pull request #9653:
URL: https://github.com/apache/arrow/pull/9653#issuecomment-792283078


   https://issues.apache.org/jira/browse/ARROW-11896



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] alamb opened a new pull request #9653: ARROW-11896: [Rust] Disable Debug symbols on CI test builds

2021-03-07 Thread GitBox


alamb opened a new pull request #9653:
URL: https://github.com/apache/arrow/pull/9653


   The theory is that the inclusion of debug symbols is increasing the memory 
requirements of compiling the test binaries which is causing the tests to hit 
the CI builder's memory limits (and being OOM killed). 
   
   Changes:
   Since the debug symbols aren't used for tests, run the CI tests without them 
to save memory
   
   In theory this might also make the builds faster as well



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jorgecarleitao commented on pull request #9602: ARROW-11630: [Rust] Introduce limit option for sort kernel

2021-03-07 Thread GitBox


jorgecarleitao commented on pull request #9602:
URL: https://github.com/apache/arrow/pull/9602#issuecomment-792275857


   Thanks @sundy-li for the explanation.
   
   > The partial_sort crate seems also quite small (about 60 lines without 
tests/imports). What about having the code in Arrow for now, and (try to) limit 
the amount of unsafety @sundy-li and we can have a next review iteration?
   
   I do not follow the argument. Isn't the whole purpose of crates and 
dependencies to separate concerns and allow code reuse across the ecosystem 
without dragging dependencies?
   
   From what I've seen, the crate:
   
   * addresses one problem and one alone (unlike our repo or our workspace)
   * has CI and CD in place
   * it is easy to publish in crates.io (via a github action) which makes CD 
also easy
   * it is duo license, as it is the standard in Rust
   * it uses SemVer
   
   If I were @sundy-li, I would keep the setup exactly like it is: it enables 
the implementation to be used by more projects without having to drag arrow and 
its dependencies with it, while at the same time maintains the generality that 
the algorithm itself has. I would be behaving exactly how @sundy-li is 
behaving: engage with potential "customers" like Arrow to identify value, how 
it can be adopted, and what are the points to improve.
   
   From our side, I am not sure I agree with copy-pasting code from a crate to 
our own crate. IMO we should instead actively engage with our "providers" by 
contributing upstream. In this particular case, one step is to have MIRI run on 
CI. The other is to either limit or document the usage of `unsafe`. This way, 
the Rust community as a whole benefits from these, instead of the benefit be 
limited to users of the Arrow crate. I created a PR upstream to address one of 
the steps: https://github.com/sundy-li/partial_sort/pull/1 .
   
   Side note: @alamb , what just happened (on which I could trivially PR 
upstream without any complications, issues, etc.) is a concrete example of what 
I have been trying to express on the mailing list wrt to the UX that IMO our 
repo lacks: I can't do that on this repo.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #9652: [ARROW-11896]: Run doc tests with a single thread -- DRAFT

2021-03-07 Thread GitBox


github-actions[bot] commented on pull request #9652:
URL: https://github.com/apache/arrow/pull/9652#issuecomment-792268823


   
   
   Thanks for opening a pull request!
   
   Could you open an issue for this pull request on JIRA?
   https://issues.apache.org/jira/browse/ARROW
   
   Then could you also rename pull request title in the following format?
   
   ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
 * [Other pull requests](https://github.com/apache/arrow/pulls/)
 * [Contribution Guidelines - How to contribute 
patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] alamb opened a new pull request #9652: [ARROW-11896Run doc tests with a single thread

2021-03-07 Thread GitBox


alamb opened a new pull request #9652:
URL: https://github.com/apache/arrow/pull/9652


   testing rejiggering the tests so they don't run the doc tests in parallel -- 
related to https://issues.apache.org/jira/browse/ARROW-11896



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] alamb commented on pull request #9602: ARROW-11630: [Rust] Introduce limit option for sort kernel

2021-03-07 Thread GitBox


alamb commented on pull request #9602:
URL: https://github.com/apache/arrow/pull/9602#issuecomment-792265078


   > In ClickHouse, PartialSortingTransform(Each block in each thread) 
   
   This is a very cool idea (effectively to compute top-k for each block in 
parallel and then merge them together). Thank you for the explanation @sundy-li 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] alamb commented on pull request #9647: ARROW-10903 [Rust] Implement FromIter>> constructor for FixedSizeBinaryArray

2021-03-07 Thread GitBox


alamb commented on pull request #9647:
URL: https://github.com/apache/arrow/pull/9647#issuecomment-792264645


   BTW the test workspace check is also failing on master, so it may not be 
related to this PR. See more details on 
https://issues.apache.org/jira/browse/ARROW-11896
   
   The integration test failure looks strange (we had fixed something like that 
previously and it looks like your branch has the fix).  I retriggered the test 
jobs to see if I could get a passing run



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] alamb commented on pull request #9232: ARROW-10818: [Rust] Implement DecimalType

2021-03-07 Thread GitBox


alamb commented on pull request #9232:
URL: https://github.com/apache/arrow/pull/9232#issuecomment-792264052


   BTW the test workspace check is  failing on master. See more details on 
https://issues.apache.org/jira/browse/ARROW-11896; I don't want you to waste 
your time hunting down a bug that is not yours.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] alamb commented on pull request #9646: ARROW-11895: [Rust][DataFusion] Add support for more column statistics

2021-03-07 Thread GitBox


alamb commented on pull request #9646:
URL: https://github.com/apache/arrow/pull/9646#issuecomment-792263960


   BTW the test workspace check is also failing on master, so it may not be 
related to this PR. See more details on 
https://issues.apache.org/jira/browse/ARROW-11896



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] li-wu edited a comment on pull request #9450: ARROW-11565: [C++][Gandiva] Modify upper()/lower() logic to make them work for utf8 strings

2021-03-07 Thread GitBox


li-wu edited a comment on pull request #9450:
URL: https://github.com/apache/arrow/pull/9450#issuecomment-792167454


   Any update for the PR? Thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] alamb commented on pull request #9645: ARROW-11894: [Rust][DataFusion] Change flight server example to use DataFrame API

2021-03-07 Thread GitBox


alamb commented on pull request #9645:
URL: https://github.com/apache/arrow/pull/9645#issuecomment-792263737


   BTW the test workspace check is also failing on master, so it may not be 
related to this PR. See more details on 
https://issues.apache.org/jira/browse/ARROW-11896



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] Dandandan commented on pull request #9602: ARROW-11630: [Rust] Introduce limit option for sort kernel

2021-03-07 Thread GitBox


Dandandan commented on pull request #9602:
URL: https://github.com/apache/arrow/pull/9602#issuecomment-792262775


   I think there are some very nice use-cases of this optimization (top-k kind 
of queries), I think it would be very cool to have more advanced features like 
this in Arrow/DataFusion/...
   
   The partial_sort crate seems also quite small (about 60 lines without 
tests/imports). What about having the code in Arrow for now, and (try to) limit 
the amount of unsafety @sundy-li and we can have a next review iteration?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #9651: NONE [Rust] test change to trigger CI hang - TESTING NOT FOR MERGING

2021-03-07 Thread GitBox


github-actions[bot] commented on pull request #9651:
URL: https://github.com/apache/arrow/pull/9651#issuecomment-792259384


   
   
   Thanks for opening a pull request!
   
   Could you open an issue for this pull request on JIRA?
   https://issues.apache.org/jira/browse/ARROW
   
   Then could you also rename pull request title in the following format?
   
   ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
 * [Other pull requests](https://github.com/apache/arrow/pulls/)
 * [Contribution Guidelines - How to contribute 
patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] alamb opened a new pull request #9651: NONE [Rust] test change to trigger CI hang - TESTING NOT FOR MERGING

2021-03-07 Thread GitBox


alamb opened a new pull request #9651:
URL: https://github.com/apache/arrow/pull/9651


   Please ignore -- I am trying to debug a hang on CI / 
https://issues.apache.org/jira/browse/ARROW-11630



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] codecov-io edited a comment on pull request #9646: ARROW-11895: [Rust][DataFusion] Add support for more column statistics

2021-03-07 Thread GitBox


codecov-io edited a comment on pull request #9646:
URL: https://github.com/apache/arrow/pull/9646#issuecomment-791918396


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9646?src=pr=h1) Report
   > Merging 
[#9646](https://codecov.io/gh/apache/arrow/pull/9646?src=pr=desc) (b5bafa5) 
into 
[master](https://codecov.io/gh/apache/arrow/commit/bfa99d98e6f0b49aed079d9f99668390f84a933f?el=desc)
 (bfa99d9) will **increase** coverage by `0.00%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/arrow/pull/9646/graphs/tree.svg?width=650=150=pr=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9646?src=pr=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#9646   +/-   ##
   ===
 Coverage   82.49%   82.50%   
   ===
 Files 245  245   
 Lines   5734757375   +28 
   ===
   + Hits4731147339   +28 
 Misses  1003610036   
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/arrow/pull/9646?src=pr=tree) | Coverage 
Δ | |
   |---|---|---|
   | 
[rust/datafusion/src/datasource/datasource.rs](https://codecov.io/gh/apache/arrow/pull/9646/diff?src=pr=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9kYXRhc291cmNlL2RhdGFzb3VyY2UucnM=)
 | `100.00% <ø> (ø)` | |
   | 
[rust/datafusion/src/datasource/memory.rs](https://codecov.io/gh/apache/arrow/pull/9646/diff?src=pr=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9kYXRhc291cmNlL21lbW9yeS5ycw==)
 | `85.15% <100.00%> (+1.04%)` | :arrow_up: |
   | 
[rust/datafusion/src/physical\_plan/parquet.rs](https://codecov.io/gh/apache/arrow/pull/9646/diff?src=pr=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL3BhcnF1ZXQucnM=)
 | `88.08% <100.00%> (+0.24%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/arrow/pull/9646?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/arrow/pull/9646?src=pr=footer). Last 
update 
[bfa99d9...b5bafa5](https://codecov.io/gh/apache/arrow/pull/9646?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] Dandandan commented on pull request #9646: ARROW-11895: [Rust][DataFusion] Add support for more column statistics

2021-03-07 Thread GitBox


Dandandan commented on pull request #9646:
URL: https://github.com/apache/arrow/pull/9646#issuecomment-792250236


   FYI @andygrove I plan to use this later for some `ANALYZE / COMPUTE 
STATISTICS` support (would be useful for in memory data) and/or use parquet 
statistics to make the join order optimization more advanced.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] codecov-io commented on pull request #9650: ARROW-11898: [Rust] Pretty print columns

2021-03-07 Thread GitBox


codecov-io commented on pull request #9650:
URL: https://github.com/apache/arrow/pull/9650#issuecomment-792249718


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9650?src=pr=h1) Report
   > Merging 
[#9650](https://codecov.io/gh/apache/arrow/pull/9650?src=pr=desc) (e5c8fe5) 
into 
[master](https://codecov.io/gh/apache/arrow/commit/dec5ab9677a0dfb9e324a57ccf5550217a2117be?el=desc)
 (dec5ab9) will **decrease** coverage by `0.01%`.
   > The diff coverage is `71.42%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/arrow/pull/9650/graphs/tree.svg?width=650=150=pr=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9650?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#9650  +/-   ##
   ==
   - Coverage   82.51%   82.50%   -0.02% 
   ==
 Files 245  245  
 Lines   5732957378  +49 
   ==
   + Hits4730647338  +32 
   - Misses  1002310040  +17 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/arrow/pull/9650?src=pr=tree) | Coverage 
Δ | |
   |---|---|---|
   | 
[rust/arrow/src/array/array\_list.rs](https://codecov.io/gh/apache/arrow/pull/9650/diff?src=pr=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfbGlzdC5ycw==)
 | `92.88% <ø> (-0.07%)` | :arrow_down: |
   | 
[rust/arrow/src/datatypes/native.rs](https://codecov.io/gh/apache/arrow/pull/9650/diff?src=pr=tree#diff-cnVzdC9hcnJvdy9zcmMvZGF0YXR5cGVzL25hdGl2ZS5ycw==)
 | `76.59% <22.22%> (-12.88%)` | :arrow_down: |
   | 
[rust/arrow/src/util/pretty.rs](https://codecov.io/gh/apache/arrow/pull/9650/diff?src=pr=tree#diff-cnVzdC9hcnJvdy9zcmMvdXRpbC9wcmV0dHkucnM=)
 | `94.77% <87.09%> (-2.32%)` | :arrow_down: |
   | 
[rust/arrow/src/array/array\_binary.rs](https://codecov.io/gh/apache/arrow/pull/9650/diff?src=pr=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfYmluYXJ5LnJz)
 | `91.82% <100.00%> (ø)` | |
   | 
[rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/9650/diff?src=pr=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz)
 | `95.00% <100.00%> (+0.04%)` | :arrow_up: |
   | 
[rust/arrow/src/array/array\_string.rs](https://codecov.io/gh/apache/arrow/pull/9650/diff?src=pr=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfc3RyaW5nLnJz)
 | `96.06% <100.00%> (ø)` | |
   | 
[rust/datafusion/src/physical\_plan/merge.rs](https://codecov.io/gh/apache/arrow/pull/9650/diff?src=pr=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL21lcmdlLnJz)
 | `76.56% <100.00%> (ø)` | |
   | 
[rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9650/diff?src=pr=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=)
 | `95.05% <0.00%> (+0.19%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/arrow/pull/9650?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/arrow/pull/9650?src=pr=footer). Last 
update 
[2585f7c...e5c8fe5](https://codecov.io/gh/apache/arrow/pull/9650?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nevi-me commented on a change in pull request #9650: ARROW-11898: [Rust] Pretty print columns

2021-03-07 Thread GitBox


nevi-me commented on a change in pull request #9650:
URL: https://github.com/apache/arrow/pull/9650#discussion_r589002450



##
File path: .github/workflows/rust.yml
##
@@ -110,7 +110,7 @@ jobs:
   export CARGO_TARGET_DIR="/github/home/target"
   cd rust
   # run tests on all workspace members with default feature list
-  cargo test
+  # cargo test

Review comment:
   I'm piggybacking off this PR to try find the cause for the CI hang





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #9650: ARROW-11898: [Rust] Pretty print columns

2021-03-07 Thread GitBox


github-actions[bot] commented on pull request #9650:
URL: https://github.com/apache/arrow/pull/9650#issuecomment-792246946


   https://issues.apache.org/jira/browse/ARROW-11898



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nevi-me opened a new pull request #9650: ARROW-11898: [Rust] Pretty print columns

2021-03-07 Thread GitBox


nevi-me opened a new pull request #9650:
URL: https://github.com/apache/arrow/pull/9650


   Adds a way to also pretty-print a slice of columns, as a convenience to 
aviod creating a single-column record batch.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org