[jira] [Assigned] (ARROW-6601) [Java] Add benchmark for JDBC adapter to avoid potential regression

2019-09-22 Thread Ji Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu reassigned ARROW-6601:
-

Assignee: Ji Liu

> [Java] Add benchmark for JDBC adapter to avoid potential regression
> ---
>
> Key: ARROW-6601
> URL: https://issues.apache.org/jira/browse/ARROW-6601
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Critical
>
> Add a performance test as well to get a baseline number, to avoid performance 
> regression when we change related code.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6661) [Java] Implement APIs like slice to enhance VectorSchemaRoot

2019-09-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6661:
--
Labels: pull-request-available  (was: )

> [Java] Implement APIs like slice to enhance VectorSchemaRoot
> 
>
> Key: ARROW-6661
> URL: https://issues.apache.org/jira/browse/ARROW-6661
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Major
>  Labels: pull-request-available
>
> Currently in Java Implementation there is no APIs like slice for record batch 
> like C++/Python.
> This issue is about to implement slice/getVector/addVector/removeVector.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6664) [C++] Add option to build without SSE4.2

2019-09-22 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6664:
---

 Summary: [C++] Add option to build without SSE4.2
 Key: ARROW-6664
 URL: https://issues.apache.org/jira/browse/ARROW-6664
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 0.15.0


Child task of ARROW-5381



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-5381) [C++] Crash at arrow::internal::CountSetBits

2019-09-22 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-5381:
---

Assignee: Wes McKinney

> [C++] Crash at arrow::internal::CountSetBits
> 
>
> Key: ARROW-5381
> URL: https://issues.apache.org/jira/browse/ARROW-5381
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: Operating System: Windows 7 Professional 64-bit (6.1, 
> Build 7601) Service Pack 1(7601.win7sp1_ldr_escrow.181110-1429)
> Language: English (Regional Setting: English)
> System Manufacturer: SAMSUNG ELECTRONICS CO., LTD.
> System Model: RV420/RV520/RV720/E3530/S3530/E3420/E3520
> BIOS: Phoenix SecureCore-Tiano(tm) NB Version 2.1 05PQ
> Processor: Intel(R) Pentium(R) CPU B950 @ 2.10GHz (2 CPUs), ~2.1GHz
> Memory: 2048MB RAM
> Available OS Memory: 1962MB RAM
>   Page File: 1517MB used, 2405MB available
> Windows Dir: C:\Windows
> DirectX Version: DirectX 11
>Reporter: Tham
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Attachments: bit-util.asm, iMac-late2009.png, popcnt_support.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I've got a lot of crash dump from a customer's windows machine. The 
> stacktrace shows that it crashed at arrow::internal::CountSetBits.
>  
> {code:java}
> STACK_TEXT:  
> 00c9`5354a4c0 7ff7`2f2830fd : 00c9`544841c0 ` 
> `1e00 ` : 
> CortexService!arrow::internal::CountSetBits+0x16d
> 00c9`5354a550 7ff7`2f2834b7 : 00c9`5337c930 ` 
> ` ` : 
> CortexService!arrow::ArrayData::GetNullCount+0x8d
> 00c9`5354a580 7ff7`2f13df55 : 00c9`54476080 00c9`5354a5d8 
> ` ` : 
> CortexService!arrow::Array::null_count+0x37
> 00c9`5354a5b0 7ff7`2f13fb68 : 00c9`5354ab40 00c9`5354a6f8 
> 00c9`54476080 ` : 
> CortexService!parquet::arrow::`anonymous 
> namespace'::LevelBuilder::Visit >+0xa5
> 00c9`5354a640 7ff7`2f12fa34 : 00c9`5354a6f8 00c9`54476080 
> 00c9`5354ab40 ` : 
> CortexService!arrow::VisitArrayInline namespace'::LevelBuilder>+0x298
> 00c9`5354a680 7ff7`2f14bf03 : 00c9`5354ab40 00c9`5354a6f8 
> 00c9`54476080 ` : 
> CortexService!parquet::arrow::`anonymous 
> namespace'::LevelBuilder::VisitInline+0x44
> 00c9`5354a6c0 7ff7`2f12fe2a : 00c9`5354ab40 00c9`5354ae18 
> 00c9`54476080 00c9`5354b208 : 
> CortexService!parquet::arrow::`anonymous 
> namespace'::LevelBuilder::GenerateLevels+0x93
> 00c9`5354aa00 7ff7`2f14de56 : 00c9`5354b1f8 00c9`5354afc8 
> 00c9`54476080 `1e00 : 
> CortexService!parquet::arrow::`anonymous 
> namespace'::ArrowColumnWriter::Write+0x25a
> 00c9`5354af20 7ff7`2f14e66b : 00c9`5354b1f8 00c9`5354b238 
> 00c9`54445c20 ` : 
> CortexService!parquet::arrow::`anonymous 
> namespace'::ArrowColumnWriter::Write+0x2a6
> 00c9`5354b040 7ff7`2f12f137 : 00c9`544041f0 00c9`5354b4d8 
> 00c9`5354b4a8 ` : 
> CortexService!parquet::arrow::FileWriter::Impl::WriteColumnChunk+0x70b
> 00c9`5354b400 7ff7`2f14b4d5 : 00c9`54431180 00c9`5354b4d8 
> 00c9`5354b4a8 ` : 
> CortexService!parquet::arrow::FileWriter::WriteColumnChunk+0x67
> 00c9`5354b450 7ff7`2f12eef1 : 00c9`5354b5d8 00c9`5354b648 
> ` `1e00 : 
> CortexService!::operator()+0x195
> 00c9`5354b530 7ff7`2eb8e31e : 00c9`54431180 00c9`5354b760 
> 00c9`54442fb0 `1e00 : 
> CortexService!parquet::arrow::FileWriter::WriteTable+0x521
> 00c9`5354b730 7ff7`2eb58ac5 : 00c9`5307bd88 00c9`54442fb0 
> ` ` : 
> CortexService!Cortex::Storage::ParquetStreamWriter::writeRowGroup+0xfe
> 00c9`5354b860 7ff7`2eafdce6 : 00c9`5307bd80 00c9`5354ba08 
> 00c9`5354b9e0 00c9`5354b9d8 : 
> CortexService!Cortex::Storage::ParquetFileWriter::writeRowGroup+0x545
> 00c9`5354b9a0 7ff7`2eaf8bae : 00c9`53275600 00c9`53077220 
> `fffe ` : 
> CortexService!Cortex::Storage::DataStreamWriteWorker::onNewData+0x1a6
> {code}
> {code:java}
> FAILED_INSTRUCTION_ADDRESS: 
> CortexService!arrow::internal::CountSetBits+16d 
> [c:\jenkins\workspace\cortexv2-dev-win64-service\src\thirdparty\arrow\cpp\src\arrow\util\bit-util.cc
>  @ 99]
> 7ff7`2f3a4e4d f3480fb800  popcnt  rax,qword ptr [rax]
> FOLLOWUP_IP: 
> CortexService!arrow::internal::CountSetBits+16d 
> [c:\jenkins\workspace\cortexv2-dev-win64-service\src\thirdparty\arrow\

[jira] [Created] (ARROW-6663) [C++] Use software __builtin_popcountll when building without SSE4.2

2019-09-22 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6663:
---

 Summary: [C++] Use software __builtin_popcountll when building 
without SSE4.2
 Key: ARROW-6663
 URL: https://issues.apache.org/jira/browse/ARROW-6663
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 1.0.0


This is to be extra safe in the context of ARROW-5381



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6662) [Java] Implement equals/approxEquals API for VectorSchemaRoot

2019-09-22 Thread Ji Liu (Jira)
Ji Liu created ARROW-6662:
-

 Summary: [Java] Implement equals/approxEquals API for 
VectorSchemaRoot
 Key: ARROW-6662
 URL: https://issues.apache.org/jira/browse/ARROW-6662
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Java
Reporter: Ji Liu
Assignee: Ji Liu


Currently with the new added visitor APIs(ARROW-6211), we could implement 
equals/approxEquals for VectorSchemaRoot.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6661) [Java] Implement APIs like slice to enhance VectorSchemaRoot

2019-09-22 Thread Ji Liu (Jira)
Ji Liu created ARROW-6661:
-

 Summary: [Java] Implement APIs like slice to enhance 
VectorSchemaRoot
 Key: ARROW-6661
 URL: https://issues.apache.org/jira/browse/ARROW-6661
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Java
Reporter: Ji Liu
Assignee: Ji Liu


Currently in Java Implementation there is no APIs like slice for record batch 
like C++/Python.

This issue is about to implement slice/getVector/addVector/removeVector.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-6352) [Java] Add implementation of DenseUnionVector.

2019-09-22 Thread Liya Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liya Fan reassigned ARROW-6352:
---

Assignee: Liya Fan

> [Java] Add implementation of DenseUnionVector.
> --
>
> Key: ARROW-6352
> URL: https://issues.apache.org/jira/browse/ARROW-6352
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Micah Kornfield
>Assignee: Liya Fan
>Priority: Major
>
> Today only Sparse unions are supported.  We should have a dense union 
> implementation vector that conforms to the IPC protocol (the current sparse 
> union vector doesn't do this and there are other JIRAs covering making it 
> compatible).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6621) [Rust][DataFusion] Examples for DataFusion are not executed in CI

2019-09-22 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan resolved ARROW-6621.

Fix Version/s: (was: 1.0.0)
   0.15.0
   Resolution: Fixed

Issue resolved by pull request 5467
[https://github.com/apache/arrow/pull/5467]

> [Rust][DataFusion] Examples for DataFusion are not executed in CI
> -
>
> Key: ARROW-6621
> URL: https://issues.apache.org/jira/browse/ARROW-6621
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Affects Versions: 0.14.1
>Reporter: Paddy Horan
>Assignee: Andy Grove
>Priority: Minor
>  Labels: beginner, pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> See the CI scripts, we already test the examples for the Arrow sub-crate



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6621) [Rust][DataFusion] Examples for DataFusion are not executed in CI

2019-09-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6621:
--
Labels: beginner pull-request-available  (was: beginner)

> [Rust][DataFusion] Examples for DataFusion are not executed in CI
> -
>
> Key: ARROW-6621
> URL: https://issues.apache.org/jira/browse/ARROW-6621
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Affects Versions: 0.14.1
>Reporter: Paddy Horan
>Assignee: Andy Grove
>Priority: Minor
>  Labels: beginner, pull-request-available
> Fix For: 1.0.0
>
>
> See the CI scripts, we already test the examples for the Arrow sub-crate



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6660) [Rust] [DataFusion] Minor docs update for 0.15.0 release

2019-09-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6660:
--
Labels: pull-request-available  (was: )

> [Rust] [DataFusion] Minor docs update for 0.15.0 release
> 
>
> Key: ARROW-6660
> URL: https://issues.apache.org/jira/browse/ARROW-6660
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>
> Minor docs update for 0.15.0 release



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6660) [Rust] [DataFusion] Minor docs update for 0.15.0 release

2019-09-22 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan resolved ARROW-6660.

Resolution: Fixed

Issue resolved by pull request 5466
[https://github.com/apache/arrow/pull/5466]

> [Rust] [DataFusion] Minor docs update for 0.15.0 release
> 
>
> Key: ARROW-6660
> URL: https://issues.apache.org/jira/browse/ARROW-6660
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Minor docs update for 0.15.0 release



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6353) [Python] Allow user to select compression level in pyarrow.parquet.write_table

2019-09-22 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-6353.
-
Resolution: Fixed

Issue resolved by pull request 5446
[https://github.com/apache/arrow/pull/5446]

> [Python] Allow user to select compression level in pyarrow.parquet.write_table
> --
>
> Key: ARROW-6353
> URL: https://issues.apache.org/jira/browse/ARROW-6353
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Igor Yastrebov
>Assignee: Martin Radev
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> This feature was introduced for C++ in 
> [ARROW-6216|https://issues.apache.org/jira/browse/ARROW-6216].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-5381) [C++] Crash at arrow::internal::CountSetBits

2019-09-22 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935409#comment-16935409
 ] 

Wes McKinney commented on ARROW-5381:
-

I put up a PR to get the project running on 2009-era Intel architecture. I am 
not sure this will fix the Windows issue, though

We might add a software implementation of {{__builtin_popcountll}} just in case

https://github.com/RoaringBitmap/CRoaring/blob/master/include/roaring/portability.h#L162

> [C++] Crash at arrow::internal::CountSetBits
> 
>
> Key: ARROW-5381
> URL: https://issues.apache.org/jira/browse/ARROW-5381
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: Operating System: Windows 7 Professional 64-bit (6.1, 
> Build 7601) Service Pack 1(7601.win7sp1_ldr_escrow.181110-1429)
> Language: English (Regional Setting: English)
> System Manufacturer: SAMSUNG ELECTRONICS CO., LTD.
> System Model: RV420/RV520/RV720/E3530/S3530/E3420/E3520
> BIOS: Phoenix SecureCore-Tiano(tm) NB Version 2.1 05PQ
> Processor: Intel(R) Pentium(R) CPU B950 @ 2.10GHz (2 CPUs), ~2.1GHz
> Memory: 2048MB RAM
> Available OS Memory: 1962MB RAM
>   Page File: 1517MB used, 2405MB available
> Windows Dir: C:\Windows
> DirectX Version: DirectX 11
>Reporter: Tham
>Priority: Major
>  Labels: pull-request-available
> Attachments: bit-util.asm, iMac-late2009.png, popcnt_support.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I've got a lot of crash dump from a customer's windows machine. The 
> stacktrace shows that it crashed at arrow::internal::CountSetBits.
>  
> {code:java}
> STACK_TEXT:  
> 00c9`5354a4c0 7ff7`2f2830fd : 00c9`544841c0 ` 
> `1e00 ` : 
> CortexService!arrow::internal::CountSetBits+0x16d
> 00c9`5354a550 7ff7`2f2834b7 : 00c9`5337c930 ` 
> ` ` : 
> CortexService!arrow::ArrayData::GetNullCount+0x8d
> 00c9`5354a580 7ff7`2f13df55 : 00c9`54476080 00c9`5354a5d8 
> ` ` : 
> CortexService!arrow::Array::null_count+0x37
> 00c9`5354a5b0 7ff7`2f13fb68 : 00c9`5354ab40 00c9`5354a6f8 
> 00c9`54476080 ` : 
> CortexService!parquet::arrow::`anonymous 
> namespace'::LevelBuilder::Visit >+0xa5
> 00c9`5354a640 7ff7`2f12fa34 : 00c9`5354a6f8 00c9`54476080 
> 00c9`5354ab40 ` : 
> CortexService!arrow::VisitArrayInline namespace'::LevelBuilder>+0x298
> 00c9`5354a680 7ff7`2f14bf03 : 00c9`5354ab40 00c9`5354a6f8 
> 00c9`54476080 ` : 
> CortexService!parquet::arrow::`anonymous 
> namespace'::LevelBuilder::VisitInline+0x44
> 00c9`5354a6c0 7ff7`2f12fe2a : 00c9`5354ab40 00c9`5354ae18 
> 00c9`54476080 00c9`5354b208 : 
> CortexService!parquet::arrow::`anonymous 
> namespace'::LevelBuilder::GenerateLevels+0x93
> 00c9`5354aa00 7ff7`2f14de56 : 00c9`5354b1f8 00c9`5354afc8 
> 00c9`54476080 `1e00 : 
> CortexService!parquet::arrow::`anonymous 
> namespace'::ArrowColumnWriter::Write+0x25a
> 00c9`5354af20 7ff7`2f14e66b : 00c9`5354b1f8 00c9`5354b238 
> 00c9`54445c20 ` : 
> CortexService!parquet::arrow::`anonymous 
> namespace'::ArrowColumnWriter::Write+0x2a6
> 00c9`5354b040 7ff7`2f12f137 : 00c9`544041f0 00c9`5354b4d8 
> 00c9`5354b4a8 ` : 
> CortexService!parquet::arrow::FileWriter::Impl::WriteColumnChunk+0x70b
> 00c9`5354b400 7ff7`2f14b4d5 : 00c9`54431180 00c9`5354b4d8 
> 00c9`5354b4a8 ` : 
> CortexService!parquet::arrow::FileWriter::WriteColumnChunk+0x67
> 00c9`5354b450 7ff7`2f12eef1 : 00c9`5354b5d8 00c9`5354b648 
> ` `1e00 : 
> CortexService!::operator()+0x195
> 00c9`5354b530 7ff7`2eb8e31e : 00c9`54431180 00c9`5354b760 
> 00c9`54442fb0 `1e00 : 
> CortexService!parquet::arrow::FileWriter::WriteTable+0x521
> 00c9`5354b730 7ff7`2eb58ac5 : 00c9`5307bd88 00c9`54442fb0 
> ` ` : 
> CortexService!Cortex::Storage::ParquetStreamWriter::writeRowGroup+0xfe
> 00c9`5354b860 7ff7`2eafdce6 : 00c9`5307bd80 00c9`5354ba08 
> 00c9`5354b9e0 00c9`5354b9d8 : 
> CortexService!Cortex::Storage::ParquetFileWriter::writeRowGroup+0x545
> 00c9`5354b9a0 7ff7`2eaf8bae : 00c9`53275600 00c9`53077220 
> `fffe ` : 
> CortexService!Cortex::Storage::DataStreamWriteWorker::onNewData+0x1a6
> {code}
> {code:java}
> FAILED_INSTRUCTION_ADDRESS: 
> CortexService!arrow::internal::CountSetBits+16d 
> [c:\jenkins\workspace\cort

[jira] [Updated] (ARROW-5381) [C++] Crash at arrow::internal::CountSetBits

2019-09-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5381:
--
Labels: pull-request-available  (was: )

> [C++] Crash at arrow::internal::CountSetBits
> 
>
> Key: ARROW-5381
> URL: https://issues.apache.org/jira/browse/ARROW-5381
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: Operating System: Windows 7 Professional 64-bit (6.1, 
> Build 7601) Service Pack 1(7601.win7sp1_ldr_escrow.181110-1429)
> Language: English (Regional Setting: English)
> System Manufacturer: SAMSUNG ELECTRONICS CO., LTD.
> System Model: RV420/RV520/RV720/E3530/S3530/E3420/E3520
> BIOS: Phoenix SecureCore-Tiano(tm) NB Version 2.1 05PQ
> Processor: Intel(R) Pentium(R) CPU B950 @ 2.10GHz (2 CPUs), ~2.1GHz
> Memory: 2048MB RAM
> Available OS Memory: 1962MB RAM
>   Page File: 1517MB used, 2405MB available
> Windows Dir: C:\Windows
> DirectX Version: DirectX 11
>Reporter: Tham
>Priority: Major
>  Labels: pull-request-available
> Attachments: bit-util.asm, iMac-late2009.png, popcnt_support.png
>
>
> I've got a lot of crash dump from a customer's windows machine. The 
> stacktrace shows that it crashed at arrow::internal::CountSetBits.
>  
> {code:java}
> STACK_TEXT:  
> 00c9`5354a4c0 7ff7`2f2830fd : 00c9`544841c0 ` 
> `1e00 ` : 
> CortexService!arrow::internal::CountSetBits+0x16d
> 00c9`5354a550 7ff7`2f2834b7 : 00c9`5337c930 ` 
> ` ` : 
> CortexService!arrow::ArrayData::GetNullCount+0x8d
> 00c9`5354a580 7ff7`2f13df55 : 00c9`54476080 00c9`5354a5d8 
> ` ` : 
> CortexService!arrow::Array::null_count+0x37
> 00c9`5354a5b0 7ff7`2f13fb68 : 00c9`5354ab40 00c9`5354a6f8 
> 00c9`54476080 ` : 
> CortexService!parquet::arrow::`anonymous 
> namespace'::LevelBuilder::Visit >+0xa5
> 00c9`5354a640 7ff7`2f12fa34 : 00c9`5354a6f8 00c9`54476080 
> 00c9`5354ab40 ` : 
> CortexService!arrow::VisitArrayInline namespace'::LevelBuilder>+0x298
> 00c9`5354a680 7ff7`2f14bf03 : 00c9`5354ab40 00c9`5354a6f8 
> 00c9`54476080 ` : 
> CortexService!parquet::arrow::`anonymous 
> namespace'::LevelBuilder::VisitInline+0x44
> 00c9`5354a6c0 7ff7`2f12fe2a : 00c9`5354ab40 00c9`5354ae18 
> 00c9`54476080 00c9`5354b208 : 
> CortexService!parquet::arrow::`anonymous 
> namespace'::LevelBuilder::GenerateLevels+0x93
> 00c9`5354aa00 7ff7`2f14de56 : 00c9`5354b1f8 00c9`5354afc8 
> 00c9`54476080 `1e00 : 
> CortexService!parquet::arrow::`anonymous 
> namespace'::ArrowColumnWriter::Write+0x25a
> 00c9`5354af20 7ff7`2f14e66b : 00c9`5354b1f8 00c9`5354b238 
> 00c9`54445c20 ` : 
> CortexService!parquet::arrow::`anonymous 
> namespace'::ArrowColumnWriter::Write+0x2a6
> 00c9`5354b040 7ff7`2f12f137 : 00c9`544041f0 00c9`5354b4d8 
> 00c9`5354b4a8 ` : 
> CortexService!parquet::arrow::FileWriter::Impl::WriteColumnChunk+0x70b
> 00c9`5354b400 7ff7`2f14b4d5 : 00c9`54431180 00c9`5354b4d8 
> 00c9`5354b4a8 ` : 
> CortexService!parquet::arrow::FileWriter::WriteColumnChunk+0x67
> 00c9`5354b450 7ff7`2f12eef1 : 00c9`5354b5d8 00c9`5354b648 
> ` `1e00 : 
> CortexService!::operator()+0x195
> 00c9`5354b530 7ff7`2eb8e31e : 00c9`54431180 00c9`5354b760 
> 00c9`54442fb0 `1e00 : 
> CortexService!parquet::arrow::FileWriter::WriteTable+0x521
> 00c9`5354b730 7ff7`2eb58ac5 : 00c9`5307bd88 00c9`54442fb0 
> ` ` : 
> CortexService!Cortex::Storage::ParquetStreamWriter::writeRowGroup+0xfe
> 00c9`5354b860 7ff7`2eafdce6 : 00c9`5307bd80 00c9`5354ba08 
> 00c9`5354b9e0 00c9`5354b9d8 : 
> CortexService!Cortex::Storage::ParquetFileWriter::writeRowGroup+0x545
> 00c9`5354b9a0 7ff7`2eaf8bae : 00c9`53275600 00c9`53077220 
> `fffe ` : 
> CortexService!Cortex::Storage::DataStreamWriteWorker::onNewData+0x1a6
> {code}
> {code:java}
> FAILED_INSTRUCTION_ADDRESS: 
> CortexService!arrow::internal::CountSetBits+16d 
> [c:\jenkins\workspace\cortexv2-dev-win64-service\src\thirdparty\arrow\cpp\src\arrow\util\bit-util.cc
>  @ 99]
> 7ff7`2f3a4e4d f3480fb800  popcnt  rax,qword ptr [rax]
> FOLLOWUP_IP: 
> CortexService!arrow::internal::CountSetBits+16d 
> [c:\jenkins\workspace\cortexv2-dev-win64-service\src\thirdparty\arrow\cpp\src\arrow\util\bit-util.cc
>  @ 99]
> 7ff7`2f3a4e4d f3480fb800 

[jira] [Updated] (ARROW-6621) [Rust][DataFusion] Examples for DataFusion are not executed in CI

2019-09-22 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-6621:
--
Fix Version/s: 1.0.0

> [Rust][DataFusion] Examples for DataFusion are not executed in CI
> -
>
> Key: ARROW-6621
> URL: https://issues.apache.org/jira/browse/ARROW-6621
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Affects Versions: 0.14.1
>Reporter: Paddy Horan
>Assignee: Andy Grove
>Priority: Minor
>  Labels: beginner
> Fix For: 1.0.0
>
>
> See the CI scripts, we already test the examples for the Arrow sub-crate



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-6621) [Rust][DataFusion] Examples for DataFusion are not executed in CI

2019-09-22 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove reassigned ARROW-6621:
-

Assignee: Andy Grove

> [Rust][DataFusion] Examples for DataFusion are not executed in CI
> -
>
> Key: ARROW-6621
> URL: https://issues.apache.org/jira/browse/ARROW-6621
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Affects Versions: 0.14.1
>Reporter: Paddy Horan
>Assignee: Andy Grove
>Priority: Minor
>  Labels: beginner
>
> See the CI scripts, we already test the examples for the Arrow sub-crate



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-5717) [Python] Support dictionary unification when converting variable dictionaries to pandas

2019-09-22 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-5717.
-
Resolution: Fixed

Issue resolved by pull request 5458
[https://github.com/apache/arrow/pull/5458]

> [Python] Support dictionary unification when converting variable dictionaries 
> to pandas
> ---
>
> Key: ARROW-5717
> URL: https://issues.apache.org/jira/browse/ARROW-5717
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Follow up work to ARROW-5335



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6660) [Rust] [DataFusion] Minor docs update for 0.15.0 release

2019-09-22 Thread Andy Grove (Jira)
Andy Grove created ARROW-6660:
-

 Summary: [Rust] [DataFusion] Minor docs update for 0.15.0 release
 Key: ARROW-6660
 URL: https://issues.apache.org/jira/browse/ARROW-6660
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 0.15.0


Minor docs update for 0.15.0 release



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6659) [Rust] [DataFusion] Refactor of HashAggregateExec to support custom merge

2019-09-22 Thread Andy Grove (Jira)
Andy Grove created ARROW-6659:
-

 Summary: [Rust] [DataFusion] Refactor of HashAggregateExec to 
support custom merge
 Key: ARROW-6659
 URL: https://issues.apache.org/jira/browse/ARROW-6659
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove


HashAggregateExec current creates one HashPartition per input partition for the 
initial aggregate per partition, and then explicitly calls MergeExec and then 
creates another HashPartition for the final reduce operation.

This is fine for in-memory queries in DataFusion but is not extensible. For 
example, it is not possible to provide a different MergeExec implementation 
that would distribute queries to a cluster.

A better design would be to move the logic into the query planner so that the 
physical plan contains explicit steps such as:

 
{code:java}
- HashAggregate // final aggregate
  - MergeExec
- HashAggregate // aggregate per partition
 {code}
This would then make it easier to customize the plan in other projects, to 
support distributed execution:
{code:java}
 - HashAggregate // final aggregate
   - MergeExec
  - DistributedExec
 - HashAggregate // aggregate per partition{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-5381) [C++] Crash at arrow::internal::CountSetBits

2019-09-22 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935364#comment-16935364
 ] 

Wes McKinney commented on ARROW-5381:
-

I have a pre-SSE4.2 laptop in hand now and will spend a few minutes to see 
what's involved with getting the software running properly in this environment. 

> [C++] Crash at arrow::internal::CountSetBits
> 
>
> Key: ARROW-5381
> URL: https://issues.apache.org/jira/browse/ARROW-5381
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: Operating System: Windows 7 Professional 64-bit (6.1, 
> Build 7601) Service Pack 1(7601.win7sp1_ldr_escrow.181110-1429)
> Language: English (Regional Setting: English)
> System Manufacturer: SAMSUNG ELECTRONICS CO., LTD.
> System Model: RV420/RV520/RV720/E3530/S3530/E3420/E3520
> BIOS: Phoenix SecureCore-Tiano(tm) NB Version 2.1 05PQ
> Processor: Intel(R) Pentium(R) CPU B950 @ 2.10GHz (2 CPUs), ~2.1GHz
> Memory: 2048MB RAM
> Available OS Memory: 1962MB RAM
>   Page File: 1517MB used, 2405MB available
> Windows Dir: C:\Windows
> DirectX Version: DirectX 11
>Reporter: Tham
>Priority: Major
> Attachments: bit-util.asm, iMac-late2009.png, popcnt_support.png
>
>
> I've got a lot of crash dump from a customer's windows machine. The 
> stacktrace shows that it crashed at arrow::internal::CountSetBits.
>  
> {code:java}
> STACK_TEXT:  
> 00c9`5354a4c0 7ff7`2f2830fd : 00c9`544841c0 ` 
> `1e00 ` : 
> CortexService!arrow::internal::CountSetBits+0x16d
> 00c9`5354a550 7ff7`2f2834b7 : 00c9`5337c930 ` 
> ` ` : 
> CortexService!arrow::ArrayData::GetNullCount+0x8d
> 00c9`5354a580 7ff7`2f13df55 : 00c9`54476080 00c9`5354a5d8 
> ` ` : 
> CortexService!arrow::Array::null_count+0x37
> 00c9`5354a5b0 7ff7`2f13fb68 : 00c9`5354ab40 00c9`5354a6f8 
> 00c9`54476080 ` : 
> CortexService!parquet::arrow::`anonymous 
> namespace'::LevelBuilder::Visit >+0xa5
> 00c9`5354a640 7ff7`2f12fa34 : 00c9`5354a6f8 00c9`54476080 
> 00c9`5354ab40 ` : 
> CortexService!arrow::VisitArrayInline namespace'::LevelBuilder>+0x298
> 00c9`5354a680 7ff7`2f14bf03 : 00c9`5354ab40 00c9`5354a6f8 
> 00c9`54476080 ` : 
> CortexService!parquet::arrow::`anonymous 
> namespace'::LevelBuilder::VisitInline+0x44
> 00c9`5354a6c0 7ff7`2f12fe2a : 00c9`5354ab40 00c9`5354ae18 
> 00c9`54476080 00c9`5354b208 : 
> CortexService!parquet::arrow::`anonymous 
> namespace'::LevelBuilder::GenerateLevels+0x93
> 00c9`5354aa00 7ff7`2f14de56 : 00c9`5354b1f8 00c9`5354afc8 
> 00c9`54476080 `1e00 : 
> CortexService!parquet::arrow::`anonymous 
> namespace'::ArrowColumnWriter::Write+0x25a
> 00c9`5354af20 7ff7`2f14e66b : 00c9`5354b1f8 00c9`5354b238 
> 00c9`54445c20 ` : 
> CortexService!parquet::arrow::`anonymous 
> namespace'::ArrowColumnWriter::Write+0x2a6
> 00c9`5354b040 7ff7`2f12f137 : 00c9`544041f0 00c9`5354b4d8 
> 00c9`5354b4a8 ` : 
> CortexService!parquet::arrow::FileWriter::Impl::WriteColumnChunk+0x70b
> 00c9`5354b400 7ff7`2f14b4d5 : 00c9`54431180 00c9`5354b4d8 
> 00c9`5354b4a8 ` : 
> CortexService!parquet::arrow::FileWriter::WriteColumnChunk+0x67
> 00c9`5354b450 7ff7`2f12eef1 : 00c9`5354b5d8 00c9`5354b648 
> ` `1e00 : 
> CortexService!::operator()+0x195
> 00c9`5354b530 7ff7`2eb8e31e : 00c9`54431180 00c9`5354b760 
> 00c9`54442fb0 `1e00 : 
> CortexService!parquet::arrow::FileWriter::WriteTable+0x521
> 00c9`5354b730 7ff7`2eb58ac5 : 00c9`5307bd88 00c9`54442fb0 
> ` ` : 
> CortexService!Cortex::Storage::ParquetStreamWriter::writeRowGroup+0xfe
> 00c9`5354b860 7ff7`2eafdce6 : 00c9`5307bd80 00c9`5354ba08 
> 00c9`5354b9e0 00c9`5354b9d8 : 
> CortexService!Cortex::Storage::ParquetFileWriter::writeRowGroup+0x545
> 00c9`5354b9a0 7ff7`2eaf8bae : 00c9`53275600 00c9`53077220 
> `fffe ` : 
> CortexService!Cortex::Storage::DataStreamWriteWorker::onNewData+0x1a6
> {code}
> {code:java}
> FAILED_INSTRUCTION_ADDRESS: 
> CortexService!arrow::internal::CountSetBits+16d 
> [c:\jenkins\workspace\cortexv2-dev-win64-service\src\thirdparty\arrow\cpp\src\arrow\util\bit-util.cc
>  @ 99]
> 7ff7`2f3a4e4d f3480fb800  popcnt  rax,qword ptr [rax]
> FOLLOWUP_IP: 
> CortexService!arrow::internal::CountSetBits+16d 
> [c:\jenkins\workspace\co

[jira] [Updated] (ARROW-6656) [Rust] [DataFusion] Implement MIN and MAX aggregate expressions

2019-09-22 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-6656:
--
Summary: [Rust] [DataFusion] Implement MIN and MAX aggregate expressions  
(was: [Rust] [DataFusion] Implement MIN and MAX)

> [Rust] [DataFusion] Implement MIN and MAX aggregate expressions
> ---
>
> Key: ARROW-6656
> URL: https://issues.apache.org/jira/browse/ARROW-6656
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Priority: Major
> Fix For: 1.0.0
>
>
> Implement MIN and MAX aggregate expressions. See the SUM implementation for 
> inspiration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6656) [Rust] [DataFusion] Implement MIN and MAX aggregate expressions

2019-09-22 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-6656:
--
Labels: beginner  (was: )

> [Rust] [DataFusion] Implement MIN and MAX aggregate expressions
> ---
>
> Key: ARROW-6656
> URL: https://issues.apache.org/jira/browse/ARROW-6656
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Priority: Major
>  Labels: beginner
> Fix For: 1.0.0
>
>
> Implement MIN and MAX aggregate expressions. See the SUM implementation for 
> inspiration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6658) [Rust] [DataFusion] Implement AVG aggregate expression

2019-09-22 Thread Andy Grove (Jira)
Andy Grove created ARROW-6658:
-

 Summary: [Rust] [DataFusion] Implement AVG aggregate expression
 Key: ARROW-6658
 URL: https://issues.apache.org/jira/browse/ARROW-6658
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Andy Grove
 Fix For: 1.0.0


Implement AVG aggregate expression. See COUNT and SUM for inspiration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6657) [Rust] [DataFusion] Implement COUNT aggregate expression

2019-09-22 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-6657:
--
Labels: beginner  (was: )

> [Rust] [DataFusion] Implement COUNT aggregate expression
> 
>
> Key: ARROW-6657
> URL: https://issues.apache.org/jira/browse/ARROW-6657
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Andy Grove
>Priority: Major
>  Labels: beginner
> Fix For: 1.0.0
>
>
> Implement COUNT aggregate expressions. See the SUM implementation for 
> inspiration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6657) [Rust] [DataFusion] Implement COUNT aggregate expression

2019-09-22 Thread Andy Grove (Jira)
Andy Grove created ARROW-6657:
-

 Summary: [Rust] [DataFusion] Implement COUNT aggregate expression
 Key: ARROW-6657
 URL: https://issues.apache.org/jira/browse/ARROW-6657
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Andy Grove
 Fix For: 1.0.0


Implement COUNT aggregate expressions. See the SUM implementation for 
inspiration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6656) [Rust] [DataFusion] Implement MIN and MAX

2019-09-22 Thread Andy Grove (Jira)
Andy Grove created ARROW-6656:
-

 Summary: [Rust] [DataFusion] Implement MIN and MAX
 Key: ARROW-6656
 URL: https://issues.apache.org/jira/browse/ARROW-6656
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
 Fix For: 1.0.0


Implement MIN and MAX aggregate expressions. See the SUM implementation for 
inspiration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6303) [Rust] Add a feature to disable SIMD

2019-09-22 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove resolved ARROW-6303.
---
Resolution: Fixed

Issue resolved by pull request 5269
[https://github.com/apache/arrow/pull/5269]

> [Rust] Add a feature to disable SIMD
> 
>
> Key: ARROW-6303
> URL: https://issues.apache.org/jira/browse/ARROW-6303
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Paddy Horan
>Assignee: Paddy Horan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> We should allow building without SIMD behind a feature flag.  SIMD has caused 
> issues for some people, see https://issues.apache.org/jira/browse/ARROW-5613
> I could not re-produce this but it would be good to provide a solution to 
> users like this.
> Also, this would inch us closer to building on stable again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6655) [Python] Filesystem bindings for S3

2019-09-22 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-6655:
--

 Summary: [Python] Filesystem bindings for S3
 Key: ARROW-6655
 URL: https://issues.apache.org/jira/browse/ARROW-6655
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Krisztian Szucs
Assignee: Krisztian Szucs


Follow-up work of ARROW-5494: [Python] Create FileSystem bindings



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6639) [Packaging][RPM] Add support for CentOS 7 on aarch64

2019-09-22 Thread Sutou Kouhei (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sutou Kouhei resolved ARROW-6639.
-
Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request 5448
[https://github.com/apache/arrow/pull/5448]

> [Packaging][RPM] Add support for CentOS 7 on aarch64
> 
>
> Key: ARROW-6639
> URL: https://issues.apache.org/jira/browse/ARROW-6639
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Kentaro Hayashi
>Assignee: Sutou Kouhei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> apt:build rake task supports architecture to run [1], but it is not true
>  for yum task.
>  [1] 
> [https://github.com/apache/arrow/blob/master/dev/tasks/linux-packages/package-task.rb#L276]
> It is useful yum task also supports architecture (ex. i386) too. (even though 
> CentOS 6 i386 EOL reaches 2020/11)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)