[jira] [Resolved] (ARROW-6933) [Java] Suppor linear dictionary encoder

2019-10-23 Thread Micah Kornfield (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micah Kornfield resolved ARROW-6933.

Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 5692
[https://github.com/apache/arrow/pull/5692]

> [Java] Suppor linear dictionary encoder
> ---
>
> Key: ARROW-6933
> URL: https://issues.apache.org/jira/browse/ARROW-6933
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> For many scenarios, the distribution of dictionary entries is highly skewed. 
> In other words, a few dictionary entries occurs much more frequently than 
> others. If we can sort the dictionary by the non-increasing order of entry 
> frequencies, and compare each value to encode from the beginning of the 
> dictionary, we get the following benefits:
> 1)  We need no extra memory space or data structure.
> 2)  The search is extremely efficient, as we are likely to find a match 
> in the first few entries of the dictionary.
> This is the basic idea behind the linear dictionary encoder. When the 
> scenario is right (highly skewed dictionary distribution), it outperforms 
> both search based encoder and hash table based encoders. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6866) [Java] Improve the performance of calculating hash code for struct vector

2019-10-23 Thread Micah Kornfield (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micah Kornfield resolved ARROW-6866.

Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 5633
[https://github.com/apache/arrow/pull/5633]

> [Java] Improve the performance of calculating hash code for struct vector
> -
>
> Key: ARROW-6866
> URL: https://issues.apache.org/jira/browse/ARROW-6866
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Improve the performance of hashCode(int) method for StructVector:
> 1. We can get the child vectors directly, so there is no need to get the name 
> from the child vector and then use the name to get the vector. 
> 2. The child vectors cannot be null, so there is no need to check it.
> The performance improvement depends on the complexity of the hash algorithm. 
> For computational intensive hash algorithms, the improvement can be small; 
> while for simple hash algorithms, the improvement can be notable. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6672) [Java] Extract a common interface for dictionary builders

2019-10-23 Thread Micah Kornfield (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micah Kornfield resolved ARROW-6672.

Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 5486
[https://github.com/apache/arrow/pull/5486]

> [Java] Extract a common interface for dictionary builders
> -
>
> Key: ARROW-6672
> URL: https://issues.apache.org/jira/browse/ARROW-6672
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We need a common interface for dictionary builders to support more 
> sophisticated scenarios, like collecting dictionary statistics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6394) [Java] Support conversions between delta vector and partial sum vector

2019-10-23 Thread Micah Kornfield (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micah Kornfield resolved ARROW-6394.

Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 5235
[https://github.com/apache/arrow/pull/5235]

> [Java] Support conversions between delta vector and partial sum vector
> --
>
> Key: ARROW-6394
> URL: https://issues.apache.org/jira/browse/ARROW-6394
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> What is a delta vector/partial sum vector?
> Given an integer vector a with length n, its partial sum vector is another 
> integer vector b with length n + 1, with values defined as:
> b(0) = initial sum
> b(i ) = a(0) + a(1) + ... + a(i - 1) i = 1, 2, ..., n
> Given an integer vector with length n + 1, its delta vector is another 
> integer vector b with length n, with values defined as:
> b(i ) = a(i ) - a(i - 1), i = 0, 1, ... , n -1
> In this issue, we provide utilities to convert between vector and partial sum 
> vector. It is interesting to note that the two operations corresponding to 
> the discrete integration and differentian.
> These conversions have wide applications. For example,
> 1. The run-length vector proposed by Micah is based on the partial sum 
> vector, while the deduplication functionality is based on delta vector. This 
> issue provides conversions between them.
> 2. The current VarCharVector/VarBinaryVector implementations are based on 
> partial sum vector. We can transform them to delta vectors before IPC, to 
> reduce network traffic.
> 3. Converting to delta can be considered as a way for data compression. To 
> further reduce the data volume, the operation can be applied more than once, 
> to further reduce data volume.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6977) [C++] Only enable jemalloc background_thread if feature is supported

2019-10-23 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958490#comment-16958490
 ] 

Neal Richardson commented on ARROW-6977:


I made ARROW-6983 for the threading issue. ARROW-6979 is for the jemalloc for 
macOS R packages; do I remember correctly that jemalloc isn't available for 
Windows?

As for this issue, assuming it's innocuous, it's just annoying that every time 
I load the package I get this message. I don't know how to suppress it though. 
It would be nice if we could set background_thread: true in our code based on 
the same condition that jemalloc is checking.

> [C++] Only enable jemalloc background_thread if feature is supported
> 
>
> Key: ARROW-6977
> URL: https://issues.apache.org/jira/browse/ARROW-6977
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: macOS 10.14, Homebrew
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 1.0.0, 0.15.1
>
>
> Followup to ARROW-6910. When loading the R package after that patch merged, I 
> get this new message:
> {code}
> $ R
> > library(arrow)
> : option background_thread currently supports pthread only
> {code}
> https://github.com/jemalloc/jemalloc/blob/3d84bd57f4954a17059bd31330ec87d3c1876411/src/background_thread.c#L884-L887
>  is where the message comes from. Tracing that further, 
> {{have_background_thread}} comes from 
> https://github.com/jemalloc/jemalloc/blob/21cfe59ff7b10a61dabe26cd3dbfb7a255e1f5e8/include/jemalloc/internal/jemalloc_preamble.h.in#L205-L211,
>  which gets set in {{configure.ac}} here: 
> https://github.com/jemalloc/jemalloc/blob/d2dddfb82aac9f2212922eb90324e84790704bfe/configure.ac#L2155-L2157
> In sum, on my system, that flag doesn't get set, so 
> {{have_background_thread}} is false, and when that is false and the 
> {{background_thread}} option is true, I get that message printed. And I do 
> not want to see that message.
> cc [~wesm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6983) [C++] Threaded task group crashes sometimes

2019-10-23 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-6983:
--

 Summary: [C++] Threaded task group crashes sometimes
 Key: ARROW-6983
 URL: https://issues.apache.org/jira/browse/ARROW-6983
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Neal Richardson
Assignee: Antoine Pitrou
 Fix For: 0.15.1


You can give this a more descriptive title :)

See discussion on ARROW-6977. 
https://gist.github.com/pitrou/87f3091c226db3306c45b2c32dd9aea8 seems to fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6977) [C++] Only enable jemalloc background_thread if feature is supported

2019-10-23 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958435#comment-16958435
 ] 

Wes McKinney commented on ARROW-6977:
-

The thread pool thing seems to be a new issue, can we open a JIRA for that?

What do you want to do about the background thread issue? We can set decay_ms=0 
by default with no background thread, but I'm not sure what the performance 
impact will be.

Creating packages without jemalloc is not a good idea because of the 
performance implications. 

> [C++] Only enable jemalloc background_thread if feature is supported
> 
>
> Key: ARROW-6977
> URL: https://issues.apache.org/jira/browse/ARROW-6977
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: macOS 10.14, Homebrew
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 1.0.0, 0.15.1
>
>
> Followup to ARROW-6910. When loading the R package after that patch merged, I 
> get this new message:
> {code}
> $ R
> > library(arrow)
> : option background_thread currently supports pthread only
> {code}
> https://github.com/jemalloc/jemalloc/blob/3d84bd57f4954a17059bd31330ec87d3c1876411/src/background_thread.c#L884-L887
>  is where the message comes from. Tracing that further, 
> {{have_background_thread}} comes from 
> https://github.com/jemalloc/jemalloc/blob/21cfe59ff7b10a61dabe26cd3dbfb7a255e1f5e8/include/jemalloc/internal/jemalloc_preamble.h.in#L205-L211,
>  which gets set in {{configure.ac}} here: 
> https://github.com/jemalloc/jemalloc/blob/d2dddfb82aac9f2212922eb90324e84790704bfe/configure.ac#L2155-L2157
> In sum, on my system, that flag doesn't get set, so 
> {{have_background_thread}} is false, and when that is false and the 
> {{background_thread}} option is true, I get that message printed. And I do 
> not want to see that message.
> cc [~wesm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6966) [Go] 32bit memset is null

2019-10-23 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-6966:

Fix Version/s: 1.0.0

> [Go] 32bit memset is null
> -
>
> Key: ARROW-6966
> URL: https://issues.apache.org/jira/browse/ARROW-6966
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Jonathan A Sternberg
>Assignee: Jonathan A Sternberg
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> If you use a function that calls `memset.Set`, the implementation on a 32 bit 
> machine seems to be unset. This happened in our 32 bit build here:
> [https://circleci.com/gh/influxdata/influxdb/66112#tests/containers/2]
> {code:java}
> goroutine 66 [running]:goroutine 66 
> [running]:testing.tRunner.func1(0x9e1f2c0) 
> /usr/local/go/src/testing/testing.go:830 +0x30epanic(0x899cb40, 0x9403c40) 
> /usr/local/go/src/runtime/panic.go:522 
> +0x16egithub.com/influxdata/influxdb/vendor/github.com/apache/arrow/go/arrow/memory.Set(...)
>  
> /root/go/src/github.com/influxdata/influxdb/vendor/github.com/apache/arrow/go/arrow/memory/memory.go:25github.com/influxdata/influxdb/vendor/github.com/apache/arrow/go/arrow/array.(*builder).init(0x9e44990,
>  0x20) 
> /root/go/src/github.com/influxdata/influxdb/vendor/github.com/apache/arrow/go/arrow/array/builder.go:101
>  
> +0xc7github.com/influxdata/influxdb/vendor/github.com/apache/arrow/go/arrow/array.(*Int64Builder).init(0x9e44990,
>  0x20) 
> /root/go/src/github.com/influxdata/influxdb/vendor/github.com/apache/arrow/go/arrow/array/numericbuilder.gen.go:102
>  
> +0x2fgithub.com/influxdata/influxdb/vendor/github.com/apache/arrow/go/arrow/array.(*Int64Builder).Resize(0x9e44990,
>  0x2) 
> /root/go/src/github.com/influxdata/influxdb/vendor/github.com/apache/arrow/go/arrow/array/numericbuilder.gen.go:125
>  
> +0x42github.com/influxdata/influxdb/vendor/github.com/apache/arrow/go/arrow/array.(*builder).reserve(0x9e44990,
>  0x1, 0x9c52464) 
> /root/go/src/github.com/influxdata/influxdb/vendor/github.com/apache/arrow/go/arrow/array/builder.go:138
>  
> +0x72github.com/influxdata/influxdb/vendor/github.com/apache/arrow/go/arrow/array.(*Int64Builder).Reserve(0x9e44990,
>  0x1) 
> /root/go/src/github.com/influxdata/influxdb/vendor/github.com/apache/arrow/go/arrow/array/numericbuilder.gen.go:113
>  
> +0x51github.com/influxdata/influxdb/vendor/github.com/influxdata/flux/arrow.NewInt(0x9e4a770,
>  0x1, 0x1, 0x0, 0x89f0360) 
> /root/go/src/github.com/influxdata/influxdb/vendor/github.com/influxdata/flux/arrow/int.go:10
>  
> +0x6cgithub.com/influxdata/influxdb/storage/reads.(*floatTable).advance(0x9e42070,
>  0x0) 
> /root/go/src/github.com/influxdata/influxdb/storage/reads/table.gen.go:91 
> +0x7egithub.com/influxdata/influxdb/storage/reads.newFloatTable(0x9e17740, 
> 0xe521a160, 0x9e1b8c0, 0x0, 0x0, 0x1e, 0x0, 0x8c13be0, 0x9e448a0, 0x9e448d0, 
> ...) 
> /root/go/src/github.com/influxdata/influxdb/storage/reads/table.gen.go:47 
> +0x1c2github.com/influxdata/influxdb/storage/reads.(*filterIterator).handleRead(0x9e22840,
>  0x9e0d1a0, 0x8c0ce00, 0x9e48780, 0x0, 0x0) 
> /root/go/src/github.com/influxdata/influxdb/storage/reads/reader.go:177 
> +0x755github.com/influxdata/influxdb/storage/reads.(*filterIterator).Do(0x9e22840,
>  0x9e0d170, 0x9c40070, 0x0) 
> /root/go/src/github.com/influxdata/influxdb/storage/reads/reader.go:140 
> +0x138github.com/influxdata/influxdb/storage/reads_test.TestDuplicateKeys_ReadFilter(0x9e1f2c0)
>  /root/go/src/github.com/influxdata/influxdb/storage/reads/reader_test.go:89 
> +0x1dftesting.tRunner(0x9e1f2c0, 0x8ad44e4) 
> /usr/local/go/src/testing/testing.go:865 +0x97created by testing.(*T).Run 
> /usr/local/go/src/testing/testing.go:916 +0x2b2
> {code}
> I added a print statement at where memset happened to print the function that 
> was being used and got this:
> {code}
>  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] 0
> {code}
> If I set {{memset}} with a default, the code that calls into this works fine.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-6966) [Go] 32bit memset is null

2019-10-23 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-6966:
---

Assignee: Jonathan A Sternberg

> [Go] 32bit memset is null
> -
>
> Key: ARROW-6966
> URL: https://issues.apache.org/jira/browse/ARROW-6966
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Jonathan A Sternberg
>Assignee: Jonathan A Sternberg
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> If you use a function that calls `memset.Set`, the implementation on a 32 bit 
> machine seems to be unset. This happened in our 32 bit build here:
> [https://circleci.com/gh/influxdata/influxdb/66112#tests/containers/2]
> {code:java}
> goroutine 66 [running]:goroutine 66 
> [running]:testing.tRunner.func1(0x9e1f2c0) 
> /usr/local/go/src/testing/testing.go:830 +0x30epanic(0x899cb40, 0x9403c40) 
> /usr/local/go/src/runtime/panic.go:522 
> +0x16egithub.com/influxdata/influxdb/vendor/github.com/apache/arrow/go/arrow/memory.Set(...)
>  
> /root/go/src/github.com/influxdata/influxdb/vendor/github.com/apache/arrow/go/arrow/memory/memory.go:25github.com/influxdata/influxdb/vendor/github.com/apache/arrow/go/arrow/array.(*builder).init(0x9e44990,
>  0x20) 
> /root/go/src/github.com/influxdata/influxdb/vendor/github.com/apache/arrow/go/arrow/array/builder.go:101
>  
> +0xc7github.com/influxdata/influxdb/vendor/github.com/apache/arrow/go/arrow/array.(*Int64Builder).init(0x9e44990,
>  0x20) 
> /root/go/src/github.com/influxdata/influxdb/vendor/github.com/apache/arrow/go/arrow/array/numericbuilder.gen.go:102
>  
> +0x2fgithub.com/influxdata/influxdb/vendor/github.com/apache/arrow/go/arrow/array.(*Int64Builder).Resize(0x9e44990,
>  0x2) 
> /root/go/src/github.com/influxdata/influxdb/vendor/github.com/apache/arrow/go/arrow/array/numericbuilder.gen.go:125
>  
> +0x42github.com/influxdata/influxdb/vendor/github.com/apache/arrow/go/arrow/array.(*builder).reserve(0x9e44990,
>  0x1, 0x9c52464) 
> /root/go/src/github.com/influxdata/influxdb/vendor/github.com/apache/arrow/go/arrow/array/builder.go:138
>  
> +0x72github.com/influxdata/influxdb/vendor/github.com/apache/arrow/go/arrow/array.(*Int64Builder).Reserve(0x9e44990,
>  0x1) 
> /root/go/src/github.com/influxdata/influxdb/vendor/github.com/apache/arrow/go/arrow/array/numericbuilder.gen.go:113
>  
> +0x51github.com/influxdata/influxdb/vendor/github.com/influxdata/flux/arrow.NewInt(0x9e4a770,
>  0x1, 0x1, 0x0, 0x89f0360) 
> /root/go/src/github.com/influxdata/influxdb/vendor/github.com/influxdata/flux/arrow/int.go:10
>  
> +0x6cgithub.com/influxdata/influxdb/storage/reads.(*floatTable).advance(0x9e42070,
>  0x0) 
> /root/go/src/github.com/influxdata/influxdb/storage/reads/table.gen.go:91 
> +0x7egithub.com/influxdata/influxdb/storage/reads.newFloatTable(0x9e17740, 
> 0xe521a160, 0x9e1b8c0, 0x0, 0x0, 0x1e, 0x0, 0x8c13be0, 0x9e448a0, 0x9e448d0, 
> ...) 
> /root/go/src/github.com/influxdata/influxdb/storage/reads/table.gen.go:47 
> +0x1c2github.com/influxdata/influxdb/storage/reads.(*filterIterator).handleRead(0x9e22840,
>  0x9e0d1a0, 0x8c0ce00, 0x9e48780, 0x0, 0x0) 
> /root/go/src/github.com/influxdata/influxdb/storage/reads/reader.go:177 
> +0x755github.com/influxdata/influxdb/storage/reads.(*filterIterator).Do(0x9e22840,
>  0x9e0d170, 0x9c40070, 0x0) 
> /root/go/src/github.com/influxdata/influxdb/storage/reads/reader.go:140 
> +0x138github.com/influxdata/influxdb/storage/reads_test.TestDuplicateKeys_ReadFilter(0x9e1f2c0)
>  /root/go/src/github.com/influxdata/influxdb/storage/reads/reader_test.go:89 
> +0x1dftesting.tRunner(0x9e1f2c0, 0x8ad44e4) 
> /usr/local/go/src/testing/testing.go:865 +0x97created by testing.(*T).Run 
> /usr/local/go/src/testing/testing.go:916 +0x2b2
> {code}
> I added a print statement at where memset happened to print the function that 
> was being used and got this:
> {code}
>  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] 0
> {code}
> If I set {{memset}} with a default, the code that calls into this works fine.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6977) [C++] Only enable jemalloc background_thread if feature is supported

2019-10-23 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958403#comment-16958403
 ] 

Neal Richardson commented on ARROW-6977:


I applied the patch and have run the R test suite 10 times in a row, all good.

> [C++] Only enable jemalloc background_thread if feature is supported
> 
>
> Key: ARROW-6977
> URL: https://issues.apache.org/jira/browse/ARROW-6977
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: macOS 10.14, Homebrew
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 1.0.0, 0.15.1
>
>
> Followup to ARROW-6910. When loading the R package after that patch merged, I 
> get this new message:
> {code}
> $ R
> > library(arrow)
> : option background_thread currently supports pthread only
> {code}
> https://github.com/jemalloc/jemalloc/blob/3d84bd57f4954a17059bd31330ec87d3c1876411/src/background_thread.c#L884-L887
>  is where the message comes from. Tracing that further, 
> {{have_background_thread}} comes from 
> https://github.com/jemalloc/jemalloc/blob/21cfe59ff7b10a61dabe26cd3dbfb7a255e1f5e8/include/jemalloc/internal/jemalloc_preamble.h.in#L205-L211,
>  which gets set in {{configure.ac}} here: 
> https://github.com/jemalloc/jemalloc/blob/d2dddfb82aac9f2212922eb90324e84790704bfe/configure.ac#L2155-L2157
> In sum, on my system, that flag doesn't get set, so 
> {{have_background_thread}} is false, and when that is false and the 
> {{background_thread}} option is true, I get that message printed. And I do 
> not want to see that message.
> cc [~wesm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6977) [C++] Only enable jemalloc background_thread if feature is supported

2019-10-23 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958328#comment-16958328
 ] 

Neal Richardson commented on ARROW-6977:


Just a reminder that this is on master but I've rebuilt C++ with jemalloc off, 
so I'm no longer seeing that warning message. So this may be unrelated to the 
initial report. Feel free to move this to a different issue if you see fit.

> [C++] Only enable jemalloc background_thread if feature is supported
> 
>
> Key: ARROW-6977
> URL: https://issues.apache.org/jira/browse/ARROW-6977
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: macOS 10.14, Homebrew
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 1.0.0, 0.15.1
>
>
> Followup to ARROW-6910. When loading the R package after that patch merged, I 
> get this new message:
> {code}
> $ R
> > library(arrow)
> : option background_thread currently supports pthread only
> {code}
> https://github.com/jemalloc/jemalloc/blob/3d84bd57f4954a17059bd31330ec87d3c1876411/src/background_thread.c#L884-L887
>  is where the message comes from. Tracing that further, 
> {{have_background_thread}} comes from 
> https://github.com/jemalloc/jemalloc/blob/21cfe59ff7b10a61dabe26cd3dbfb7a255e1f5e8/include/jemalloc/internal/jemalloc_preamble.h.in#L205-L211,
>  which gets set in {{configure.ac}} here: 
> https://github.com/jemalloc/jemalloc/blob/d2dddfb82aac9f2212922eb90324e84790704bfe/configure.ac#L2155-L2157
> In sum, on my system, that flag doesn't get set, so 
> {{have_background_thread}} is false, and when that is false and the 
> {{background_thread}} option is true, I get that message printed. And I do 
> not want to see that message.
> cc [~wesm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6977) [C++] Only enable jemalloc background_thread if feature is supported

2019-10-23 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958329#comment-16958329
 ] 

Antoine Pitrou commented on ARROW-6977:
---

Can you give this patch a try? Run with it a number of times.
https://gist.github.com/pitrou/87f3091c226db3306c45b2c32dd9aea8


> [C++] Only enable jemalloc background_thread if feature is supported
> 
>
> Key: ARROW-6977
> URL: https://issues.apache.org/jira/browse/ARROW-6977
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: macOS 10.14, Homebrew
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 1.0.0, 0.15.1
>
>
> Followup to ARROW-6910. When loading the R package after that patch merged, I 
> get this new message:
> {code}
> $ R
> > library(arrow)
> : option background_thread currently supports pthread only
> {code}
> https://github.com/jemalloc/jemalloc/blob/3d84bd57f4954a17059bd31330ec87d3c1876411/src/background_thread.c#L884-L887
>  is where the message comes from. Tracing that further, 
> {{have_background_thread}} comes from 
> https://github.com/jemalloc/jemalloc/blob/21cfe59ff7b10a61dabe26cd3dbfb7a255e1f5e8/include/jemalloc/internal/jemalloc_preamble.h.in#L205-L211,
>  which gets set in {{configure.ac}} here: 
> https://github.com/jemalloc/jemalloc/blob/d2dddfb82aac9f2212922eb90324e84790704bfe/configure.ac#L2155-L2157
> In sum, on my system, that flag doesn't get set, so 
> {{have_background_thread}} is false, and when that is false and the 
> {{background_thread}} option is true, I get that message printed. And I do 
> not want to see that message.
> cc [~wesm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6977) [C++] Only enable jemalloc background_thread if feature is supported

2019-10-23 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958327#comment-16958327
 ] 

Neal Richardson commented on ARROW-6977:


Re-running a bunch now, I've crashed here (to the best of my estimation):

* https://github.com/apache/arrow/blob/master/r/tests/testthat/test-csv.R#L43
* https://github.com/apache/arrow/blob/master/r/tests/testthat/test-csv.R#L44
* https://github.com/apache/arrow/blob/master/r/tests/testthat/test-csv.R#L45 
(3x)
* 
https://github.com/apache/arrow/blob/master/r/tests/testthat/test-csv.R#L58-L59
* 
https://github.com/apache/arrow/blob/master/r/tests/testthat/test-Table.R#L133-L135
* https://github.com/apache/arrow/blob/master/r/tests/testthat/test-Table.R#L140
* 
https://github.com/apache/arrow/blob/master/r/tests/testthat/test-Table.R#L189-L192
* 
https://github.com/apache/arrow/blob/master/r/tests/testthat/test-Table.R#L196-L205

The common thread (excuse the pun) I *think* is that they take a Table and 
bring it into R as a data.frame. This function: 
https://github.com/apache/arrow/blob/master/r/src/array_to_vector.cpp#L811-L828

About 2 out of 3 times the test suite completes cleanly.

> [C++] Only enable jemalloc background_thread if feature is supported
> 
>
> Key: ARROW-6977
> URL: https://issues.apache.org/jira/browse/ARROW-6977
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: macOS 10.14, Homebrew
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 1.0.0, 0.15.1
>
>
> Followup to ARROW-6910. When loading the R package after that patch merged, I 
> get this new message:
> {code}
> $ R
> > library(arrow)
> : option background_thread currently supports pthread only
> {code}
> https://github.com/jemalloc/jemalloc/blob/3d84bd57f4954a17059bd31330ec87d3c1876411/src/background_thread.c#L884-L887
>  is where the message comes from. Tracing that further, 
> {{have_background_thread}} comes from 
> https://github.com/jemalloc/jemalloc/blob/21cfe59ff7b10a61dabe26cd3dbfb7a255e1f5e8/include/jemalloc/internal/jemalloc_preamble.h.in#L205-L211,
>  which gets set in {{configure.ac}} here: 
> https://github.com/jemalloc/jemalloc/blob/d2dddfb82aac9f2212922eb90324e84790704bfe/configure.ac#L2155-L2157
> In sum, on my system, that flag doesn't get set, so 
> {{have_background_thread}} is false, and when that is false and the 
> {{background_thread}} option is true, I get that message printed. And I do 
> not want to see that message.
> cc [~wesm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-6947) [Rust] [DataFusion] Add support for scalar UDFs

2019-10-23 Thread Kyle McCarthy (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958321#comment-16958321
 ] 

Kyle McCarthy edited comment on ARROW-6947 at 10/23/19 10:59 PM:
-

I am curious to see if you have any ideas about how this would work. I have 
been working on a PoC, but will probably need to make some design decisions and 
would like to see if they align with yours.

At a high level, I see this working by composing a UDF with some general 
ScalarFunction type. Right now I have the ScalarFunction with type:

 
{code:java}
Box) -> Result{code}
 

so if a users defines a function such as
{code:java}
fn length(s: String) -> usize{code}
we would wrap that and return our ScalarFunction.

I think that the composed functions need to be associated with some "static" 
metadata, similar to the FunctionMeta in the logical plan. I think we would 
want to know the DataType of the arguments that the function expects and if 
they are optional, as well as the return type and if it is fallible/infallible.

If the UDF accepts and returns primitive rust types, generating that meta data 
should be pretty straight forward. However, if the UDF takes/returns 
ScalarValues the user would have to specifically provide the metadata.

We would be able to generate most of the data for the logical plan's 
FunctionMeta but would still need the function name and the field names for the 
args.

As of right now, I haven't done anything related to Aggregate UDFs or actually 
registering them with the ExecutionContext. 


was (Author: kylemccarthy):
I am curious to see if you have any ideas about how this would work. I have 
been working on a PoC, but will probably need to make some design decisions and 
would like to see if they align with yours.

At a high level, I see this working by composing a UDF with some general 
ScalarFunction type. Right now I have the ScalarFunction with type:

```Box) -> Result```

so if a users defines a function such as

```fn length(s: String) -> usize```

we would wrap that and return our ScalarFunction.

I think that the composed functions need to be associated with some "static" 
metadata, similar to the FunctionMeta in the logical plan. I think we would 
want to know the DataType of the arguments that the function expects and if 
they are optional, as well as the return type and if it is fallible/infallible.

If the UDF accepts and returns primitive rust types, generating that meta data 
should be pretty straight forward. However, if the UDF takes/returns 
ScalarValues the user would have to specifically provide the metadata.

We would be able to generate most of the data for the logical plan's 
FunctionMeta but would still need the function name and the field names for the 
args.

As of right now, I haven't done anything related to Aggregate UDFs or actually 
registering them with the ExecutionContext. 

> [Rust] [DataFusion] Add support for scalar UDFs
> ---
>
> Key: ARROW-6947
> URL: https://issues.apache.org/jira/browse/ARROW-6947
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>
> As a user, I would like to be able to define my own functions and then use 
> them in SQL statements.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-6947) [Rust] [DataFusion] Add support for scalar UDFs

2019-10-23 Thread Kyle McCarthy (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958321#comment-16958321
 ] 

Kyle McCarthy edited comment on ARROW-6947 at 10/23/19 10:59 PM:
-

I am curious to see if you have any ideas about how this would work. I have 
been working on a PoC, but will probably need to make some design decisions and 
would like to see if they align with yours.

At a high level, I see this working by composing a UDF with some general 
ScalarFunction type. Right now I have the ScalarFunction with type: 
{code:java}
Box) -> Result{code}
so if a users defines a function such as
{code:java}
fn length(s: String) -> usize{code}
we would wrap that and return our ScalarFunction.

I think that the composed functions need to be associated with some "static" 
metadata, similar to the FunctionMeta in the logical plan. I think we would 
want to know the DataType of the arguments that the function expects and if 
they are optional, as well as the return type and if it is fallible/infallible.

If the UDF accepts and returns primitive rust types, generating that meta data 
should be pretty straight forward. However, if the UDF takes/returns 
ScalarValues the user would have to specifically provide the metadata.

We would be able to generate most of the data for the logical plan's 
FunctionMeta but would still need the function name and the field names for the 
args.

As of right now, I haven't done anything related to Aggregate UDFs or actually 
registering them with the ExecutionContext. 


was (Author: kylemccarthy):
I am curious to see if you have any ideas about how this would work. I have 
been working on a PoC, but will probably need to make some design decisions and 
would like to see if they align with yours.

At a high level, I see this working by composing a UDF with some general 
ScalarFunction type. Right now I have the ScalarFunction with type:

 
{code:java}
Box) -> Result{code}
 

so if a users defines a function such as
{code:java}
fn length(s: String) -> usize{code}
we would wrap that and return our ScalarFunction.

I think that the composed functions need to be associated with some "static" 
metadata, similar to the FunctionMeta in the logical plan. I think we would 
want to know the DataType of the arguments that the function expects and if 
they are optional, as well as the return type and if it is fallible/infallible.

If the UDF accepts and returns primitive rust types, generating that meta data 
should be pretty straight forward. However, if the UDF takes/returns 
ScalarValues the user would have to specifically provide the metadata.

We would be able to generate most of the data for the logical plan's 
FunctionMeta but would still need the function name and the field names for the 
args.

As of right now, I haven't done anything related to Aggregate UDFs or actually 
registering them with the ExecutionContext. 

> [Rust] [DataFusion] Add support for scalar UDFs
> ---
>
> Key: ARROW-6947
> URL: https://issues.apache.org/jira/browse/ARROW-6947
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>
> As a user, I would like to be able to define my own functions and then use 
> them in SQL statements.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-6947) [Rust] [DataFusion] Add support for scalar UDFs

2019-10-23 Thread Kyle McCarthy (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958321#comment-16958321
 ] 

Kyle McCarthy edited comment on ARROW-6947 at 10/23/19 10:58 PM:
-

I am curious to see if you have any ideas about how this would work. I have 
been working on a PoC, but will probably need to make some design decisions and 
would like to see if they align with yours.

At a high level, I see this working by composing a UDF with some general 
ScalarFunction type. Right now I have the ScalarFunction with type:

```Box) -> Result```

so if a users defines a function such as

```fn length(s: String) -> usize```

we would wrap that and return our ScalarFunction.

I think that the composed functions need to be associated with some "static" 
metadata, similar to the FunctionMeta in the logical plan. I think we would 
want to know the DataType of the arguments that the function expects and if 
they are optional, as well as the return type and if it is fallible/infallible.

If the UDF accepts and returns primitive rust types, generating that meta data 
should be pretty straight forward. However, if the UDF takes/returns 
ScalarValues the user would have to specifically provide the metadata.

We would be able to generate most of the data for the logical plan's 
FunctionMeta but would still need the function name and the field names for the 
args.

As of right now, I haven't done anything related to Aggregate UDFs or actually 
registering them with the ExecutionContext. 


was (Author: kylemccarthy):
I am curious to see if you have any ideas about how this would work. I have 
been working on a PoC, but will probably need to make some design decisions and 
would like to see if they align with yours.

At a high level, I see this working by composing a UDF with some general 
ScalarFunction type. Right now I have the ScalarFunction with type ```Box) -> Result```, so if a users defines a 
function such as ```fn length(s: String) -> usize``` we would wrap that and 
return our ScalarFunction.

I think that the composed functions need to be associated with some "static" 
metadata, similar to the FunctionMeta in the logical plan. I think we would 
want to know the DataType of the arguments that the function expects and if 
they are optional, as well as the return type and if it is fallible/infallible.

If the UDF accepts and returns primitive rust types, generating that meta data 
should be pretty straight forward. However, if the UDF takes/returns 
ScalarValues the user would have to specifically provide the metadata.

We would be able to generate most of the data for the logical plan's 
FunctionMeta but would still need the function name and the field names for the 
args.

As of right now, I haven't done anything related to Aggregate UDFs or actually 
registering them with the ExecutionContext. 

> [Rust] [DataFusion] Add support for scalar UDFs
> ---
>
> Key: ARROW-6947
> URL: https://issues.apache.org/jira/browse/ARROW-6947
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>
> As a user, I would like to be able to define my own functions and then use 
> them in SQL statements.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-6947) [Rust] [DataFusion] Add support for scalar UDFs

2019-10-23 Thread Kyle McCarthy (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958321#comment-16958321
 ] 

Kyle McCarthy edited comment on ARROW-6947 at 10/23/19 10:58 PM:
-

I am curious to see if you have any ideas about how this would work. I have 
been working on a PoC, but will probably need to make some design decisions and 
would like to see if they align with yours.

At a high level, I see this working by composing a UDF with some general 
ScalarFunction type. Right now I have the ScalarFunction with type ```Box) -> Result```, so if a users defines a 
function such as ```fn length(s: String) -> usize``` we would wrap that and 
return our ScalarFunction.

I think that the composed functions need to be associated with some "static" 
metadata, similar to the FunctionMeta in the logical plan. I think we would 
want to know the DataType of the arguments that the function expects and if 
they are optional, as well as the return type and if it is fallible/infallible.

If the UDF accepts and returns primitive rust types, generating that meta data 
should be pretty straight forward. However, if the UDF takes/returns 
ScalarValues the user would have to specifically provide the metadata.

We would be able to generate most of the data for the logical plan's 
FunctionMeta but would still need the function name and the field names for the 
args.

As of right now, I haven't done anything related to Aggregate UDFs or actually 
registering them with the ExecutionContext. 


was (Author: kylemccarthy):
I am curious to see if you have any ideas about how this would work. I have 
been working on a PoC, but will probably need to make some design decisions and 
would like to see if they align with yours.

At a high level, I see this working by composing a UDF with some general 
ScalarFunction type. Right now I have the ScalarFunction with type `Box) -> Result`, so if a users defines a function 
such as `fn length(s: String) -> usize` we would wrap that and return our 
ScalarFunction.

I think that the composed functions need to be associated with some "static" 
metadata, similar to the FunctionMeta in the logical plan. I think we would 
want to know the DataType of the arguments that the function expects and if 
they are optional, as well as the return type and if it is fallible/infallible.

If the UDF accepts and returns primitive rust types, generating that meta data 
should be pretty straight forward. However, if the UDF takes/returns 
ScalarValues the user would have to specifically provide the metadata.

We would be able to generate most of the data for the logical plan's 
FunctionMeta but would still need the function name and the field names for the 
args.

As of right now, I haven't done anything related to Aggregate UDFs or actually 
registering them with the ExecutionContext. 

> [Rust] [DataFusion] Add support for scalar UDFs
> ---
>
> Key: ARROW-6947
> URL: https://issues.apache.org/jira/browse/ARROW-6947
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>
> As a user, I would like to be able to define my own functions and then use 
> them in SQL statements.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6947) [Rust] [DataFusion] Add support for scalar UDFs

2019-10-23 Thread Kyle McCarthy (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958321#comment-16958321
 ] 

Kyle McCarthy commented on ARROW-6947:
--

I am curious to see if you have any ideas about how this would work. I have 
been working on a PoC, but will probably need to make some design decisions and 
would like to see if they align with yours.

At a high level, I see this working by composing a UDF with some general 
ScalarFunction type. Right now I have the ScalarFunction with type `Box) -> Result`, so if a users defines a function 
such as `fn length(s: String) -> usize` we would wrap that and return our 
ScalarFunction.

I think that the composed functions need to be associated with some "static" 
metadata, similar to the FunctionMeta in the logical plan. I think we would 
want to know the DataType of the arguments that the function expects and if 
they are optional, as well as the return type and if it is fallible/infallible.

If the UDF accepts and returns primitive rust types, generating that meta data 
should be pretty straight forward. However, if the UDF takes/returns 
ScalarValues the user would have to specifically provide the metadata.

We would be able to generate most of the data for the logical plan's 
FunctionMeta but would still need the function name and the field names for the 
args.

As of right now, I haven't done anything related to Aggregate UDFs or actually 
registering them with the ExecutionContext. 

> [Rust] [DataFusion] Add support for scalar UDFs
> ---
>
> Key: ARROW-6947
> URL: https://issues.apache.org/jira/browse/ARROW-6947
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>
> As a user, I would like to be able to define my own functions and then use 
> them in SQL statements.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6977) [C++] Only enable jemalloc background_thread if feature is supported

2019-10-23 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958318#comment-16958318
 ] 

Neal Richardson commented on ARROW-6977:


On this particular crash (after 14 assertions passed), it looks like 
https://github.com/apache/arrow/blob/master/r/tests/testthat/test-csv.R#L119 
was the line that crashed. But like I said, it sometimes doesn't fail there, 
sometimes fails earlier, and sometimes fails in a different test file.

> [C++] Only enable jemalloc background_thread if feature is supported
> 
>
> Key: ARROW-6977
> URL: https://issues.apache.org/jira/browse/ARROW-6977
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: macOS 10.14, Homebrew
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 1.0.0, 0.15.1
>
>
> Followup to ARROW-6910. When loading the R package after that patch merged, I 
> get this new message:
> {code}
> $ R
> > library(arrow)
> : option background_thread currently supports pthread only
> {code}
> https://github.com/jemalloc/jemalloc/blob/3d84bd57f4954a17059bd31330ec87d3c1876411/src/background_thread.c#L884-L887
>  is where the message comes from. Tracing that further, 
> {{have_background_thread}} comes from 
> https://github.com/jemalloc/jemalloc/blob/21cfe59ff7b10a61dabe26cd3dbfb7a255e1f5e8/include/jemalloc/internal/jemalloc_preamble.h.in#L205-L211,
>  which gets set in {{configure.ac}} here: 
> https://github.com/jemalloc/jemalloc/blob/d2dddfb82aac9f2212922eb90324e84790704bfe/configure.ac#L2155-L2157
> In sum, on my system, that flag doesn't get set, so 
> {{have_background_thread}} is false, and when that is false and the 
> {{background_thread}} option is true, I get that message printed. And I do 
> not want to see that message.
> cc [~wesm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6977) [C++] Only enable jemalloc background_thread if feature is supported

2019-10-23 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958316#comment-16958316
 ] 

Neal Richardson commented on ARROW-6977:


It doesn't always fail in the same place. Sometimes I see three test assertions 
pass and then it fails, which means that it errors before the next assertion. 
Which would put it somewhere around 
https://github.com/apache/arrow/blob/master/r/tests/testthat/test-csv.R#L43-L45.
 But other times it doesn't fail on that block and fails later. 

I'll try rewriting the tests to disambiguate and see if there's a pattern of 
where exactly it fails.

> [C++] Only enable jemalloc background_thread if feature is supported
> 
>
> Key: ARROW-6977
> URL: https://issues.apache.org/jira/browse/ARROW-6977
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: macOS 10.14, Homebrew
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 1.0.0, 0.15.1
>
>
> Followup to ARROW-6910. When loading the R package after that patch merged, I 
> get this new message:
> {code}
> $ R
> > library(arrow)
> : option background_thread currently supports pthread only
> {code}
> https://github.com/jemalloc/jemalloc/blob/3d84bd57f4954a17059bd31330ec87d3c1876411/src/background_thread.c#L884-L887
>  is where the message comes from. Tracing that further, 
> {{have_background_thread}} comes from 
> https://github.com/jemalloc/jemalloc/blob/21cfe59ff7b10a61dabe26cd3dbfb7a255e1f5e8/include/jemalloc/internal/jemalloc_preamble.h.in#L205-L211,
>  which gets set in {{configure.ac}} here: 
> https://github.com/jemalloc/jemalloc/blob/d2dddfb82aac9f2212922eb90324e84790704bfe/configure.ac#L2155-L2157
> In sum, on my system, that flag doesn't get set, so 
> {{have_background_thread}} is false, and when that is false and the 
> {{background_thread}} option is true, I get that message printed. And I do 
> not want to see that message.
> cc [~wesm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6977) [C++] Only enable jemalloc background_thread if feature is supported

2019-10-23 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958313#comment-16958313
 ] 

Antoine Pitrou commented on ARROW-6977:
---

Yeah, but what does the test it fails in precisely do?

> [C++] Only enable jemalloc background_thread if feature is supported
> 
>
> Key: ARROW-6977
> URL: https://issues.apache.org/jira/browse/ARROW-6977
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: macOS 10.14, Homebrew
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 1.0.0, 0.15.1
>
>
> Followup to ARROW-6910. When loading the R package after that patch merged, I 
> get this new message:
> {code}
> $ R
> > library(arrow)
> : option background_thread currently supports pthread only
> {code}
> https://github.com/jemalloc/jemalloc/blob/3d84bd57f4954a17059bd31330ec87d3c1876411/src/background_thread.c#L884-L887
>  is where the message comes from. Tracing that further, 
> {{have_background_thread}} comes from 
> https://github.com/jemalloc/jemalloc/blob/21cfe59ff7b10a61dabe26cd3dbfb7a255e1f5e8/include/jemalloc/internal/jemalloc_preamble.h.in#L205-L211,
>  which gets set in {{configure.ac}} here: 
> https://github.com/jemalloc/jemalloc/blob/d2dddfb82aac9f2212922eb90324e84790704bfe/configure.ac#L2155-L2157
> In sum, on my system, that flag doesn't get set, so 
> {{have_background_thread}} is false, and when that is false and the 
> {{background_thread}} option is true, I get that message printed. And I do 
> not want to see that message.
> cc [~wesm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6925) [C++] Arrow fails to buld on MacOS 10.13.6 using brew gcc 7 and 8

2019-10-23 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958312#comment-16958312
 ] 

Wes McKinney commented on ARROW-6925:
-

[~fsaintjacques] John needed to be added to the "Contributor" role on JIRA -- 
done now. You are already an admin in JIRA for Apache Arrow so you should get 
familiar with how to do this from the JIRA administration page starting from 
the top right

> [C++] Arrow fails to buld on MacOS 10.13.6 using brew gcc 7 and 8
> -
>
> Key: ARROW-6925
> URL: https://issues.apache.org/jira/browse/ARROW-6925
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: MacOS 10.13.6 using both brew gcc 7 and 8.
>Reporter: John Norris
>Assignee: John Norris
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Both SetupCxxFlags.cmake and ThirdpartyToolchain.cmake add -stdlib=libc++ to 
> the compiler flags when APPLE is true, but if you're using GCC from brew (or 
> presumably from anywhere other that Apple), this flag is not recognized and 
> the build fails.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-6925) [C++] Arrow fails to buld on MacOS 10.13.6 using brew gcc 7 and 8

2019-10-23 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-6925:
---

Assignee: John Norris

> [C++] Arrow fails to buld on MacOS 10.13.6 using brew gcc 7 and 8
> -
>
> Key: ARROW-6925
> URL: https://issues.apache.org/jira/browse/ARROW-6925
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: MacOS 10.13.6 using both brew gcc 7 and 8.
>Reporter: John Norris
>Assignee: John Norris
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Both SetupCxxFlags.cmake and ThirdpartyToolchain.cmake add -stdlib=libc++ to 
> the compiler flags when APPLE is true, but if you're using GCC from brew (or 
> presumably from anywhere other that Apple), this flag is not recognized and 
> the build fails.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6977) [C++] Only enable jemalloc background_thread if feature is supported

2019-10-23 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958309#comment-16958309
 ] 

Neal Richardson commented on ARROW-6977:


{code}
(lldb) bt all
libR.dylib was compiled with optimization - stepping may behave oddly; 
variables may not be available.
  thread #1, queue = 'com.apple.main-thread'
frame #0: 0x000100170660 libR.dylib`R_HashGet(hashcode=117, 
symbol=0x0001010689f8, table=) at envir.c:0 [opt]
frame #1: 0x000100171246 
libR.dylib`Rf_findFun3(symbol=0x0001010689f8, rho=0x00010583a680, 
call=) at envir.c:1521:11 [opt]
frame #2: 0x0001001875d3 libR.dylib`bcEval(body=, 
rho=0x00011284c468, useCache=) at eval.c:6560:15 [opt]
frame #3: 0x000100182aed libR.dylib`Rf_eval(e=, 
rho=) at eval.c:620:8 [opt]
frame #4: 0x0001001a13a9 
libR.dylib`R_execClosure(call=0x000105f48098, newrho=, 
sysparent=, rho=0x000112852f20, arglist=, 
op=) at eval.c:0:19 [opt]
frame #5: 0x0001001a02aa 
libR.dylib`Rf_applyClosure(call=0x000105f48098, op=0x00010995f8b0, 
arglist=0x00011284c190, rho=0x000112852f20, 
suppliedvars=0x0001018058e0) at eval.c:1706:16 [opt]
frame #6: 0x000100189f11 libR.dylib`bcEval(body=, 
rho=0x000112852f20, useCache=) at eval.c:6733:12 [opt]
frame #7: 0x000100182aed libR.dylib`Rf_eval(e=, 
rho=) at eval.c:620:8 [opt]
frame #8: 0x0001001a13a9 
libR.dylib`R_execClosure(call=0x0001193dcc80, newrho=, 
sysparent=, rho=0x0001193eb620, arglist=, 
op=) at eval.c:0:19 [opt]
frame #9: 0x0001001a02aa 
libR.dylib`Rf_applyClosure(call=0x0001193dcc80, op=0x000105f34f28, 
arglist=0x000112852d28, rho=0x0001193eb620, 
suppliedvars=0x0001018058e0) at eval.c:1706:16 [opt]
frame #10: 0x00010018301d libR.dylib`Rf_eval(e=0x0001193dcc80, 
rho=0x0001193eb620) at eval.c:743:12 [opt]
frame #11: 0x0001001a3a20 libR.dylib`do_begin(call=0x0001193da158, 
op=0x00010180f000, args=0x0001193dcba0, rho=) at 
eval.c:2382:10 [opt]
frame #12: 0x000100182ce0 libR.dylib`Rf_eval(e=, 
rho=0x0001193eb620) at eval.c:695:12 [opt]
frame #13: 0x00010019fa63 libR.dylib`forcePromise(e=0x0001129e8938) 
at eval.c:516:8 [opt]
frame #14: 0x000100182dd0 libR.dylib`Rf_eval(e=, 
rho=) at eval.c:643:9 [opt]
frame #15: 0x0001001a3a20 libR.dylib`do_begin(call=0x000109947450, 
op=0x00010180f000, args=0x0001099476b8, rho=) at 
eval.c:2382:10 [opt]
frame #16: 0x000100182ce0 libR.dylib`Rf_eval(e=, 
rho=0x0001129dae78) at eval.c:695:12 [opt]
frame #17: 0x0001001a4d66 libR.dylib`do_eval(call=, 
op=0x0001018260b0, args=, rho=) at eval.c:3186:13 
[opt]
frame #18: 0x00010018a326 libR.dylib`bcEval(body=, 
rho=0x0001129d60e8, useCache=) at eval.c:6765:14 [opt]
frame #19: 0x000100182aed libR.dylib`Rf_eval(e=, 
rho=) at eval.c:620:8 [opt]
frame #20: 0x0001001a13a9 
libR.dylib`R_execClosure(call=0x00010990aa40, newrho=, 
sysparent=, rho=0x0001129e6f58, arglist=, 
op=) at eval.c:0:19 [opt]
frame #21: 0x0001001a02aa 
libR.dylib`Rf_applyClosure(call=0x00010990aa40, op=0x0001028046d8, 
arglist=0x0001129d9ea8, rho=0x0001129e6f58, 
suppliedvars=0x0001018058e0) at eval.c:1706:16 [opt]
frame #22: 0x000100189f11 libR.dylib`bcEval(body=, 
rho=0x0001129e6f58, useCache=) at eval.c:6733:12 [opt]
frame #23: 0x000100182aed libR.dylib`Rf_eval(e=, 
rho=) at eval.c:620:8 [opt]
frame #24: 0x00010019fa63 libR.dylib`forcePromise(e=0x0001129d9310) 
at eval.c:516:8 [opt]
frame #25: 0x0001001aa2ec libR.dylib`getvar [inlined] 
FORCE_PROMISE(value=, symbol=, rho=, 
keepmiss=) at eval.c:4897:15 [opt]
frame #26: 0x0001001aa2e4 libR.dylib`getvar(symbol=0x000101851fc8, 
rho=0x0001129d97e0, dd=, keepmiss=, 
vcache=, sidx=, stack_base=0x000100b04ff0) at 
eval.c:4970 [opt]
frame #27: 0x000100187094 libR.dylib`bcEval(body=, 
rho=0x0001129d97e0, useCache=) at eval.c:6517:20 [opt]
frame #28: 0x000100182aed libR.dylib`Rf_eval(e=, 
rho=) at eval.c:620:8 [opt]
frame #29: 0x0001001a13a9 
libR.dylib`R_execClosure(call=0x00010990ab20, newrho=, 
sysparent=, rho=0x0001129e6f58, arglist=, 
op=) at eval.c:0:19 [opt]
frame #30: 0x0001001a02aa 
libR.dylib`Rf_applyClosure(call=0x00010990ab20, op=0x0001017539b8, 
arglist=0x0001129d9348, rho=0x0001129e6f58, 
suppliedvars=0x0001018058e0) at eval.c:1706:16 [opt]
frame #31: 0x000100189f11 libR.dylib`bcEval(body=, 
rho=0x0001129e6f58, useCache=) at eval.c:6733:12 [opt]
frame #32: 0x000100182aed libR.dylib`Rf_eval(e=, 
rho=) at eval.c:620:8 [opt]
frame #33: 0x00010019fa63 libR.dylib`forcePromise(e=0x0001129daee8) 
at eval.c:516:8 [opt]
frame #34: 0x0001001aa2ec 

[jira] [Commented] (ARROW-6977) [C++] Only enable jemalloc background_thread if feature is supported

2019-10-23 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958305#comment-16958305
 ] 

Neal Richardson commented on ARROW-6977:


It most often fails in the CSV reader, which itself has multithreading 
(recently revised?) and when the data is pulled from Arrow into and R 
data.frame, it also uses threads.

> [C++] Only enable jemalloc background_thread if feature is supported
> 
>
> Key: ARROW-6977
> URL: https://issues.apache.org/jira/browse/ARROW-6977
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: macOS 10.14, Homebrew
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 1.0.0, 0.15.1
>
>
> Followup to ARROW-6910. When loading the R package after that patch merged, I 
> get this new message:
> {code}
> $ R
> > library(arrow)
> : option background_thread currently supports pthread only
> {code}
> https://github.com/jemalloc/jemalloc/blob/3d84bd57f4954a17059bd31330ec87d3c1876411/src/background_thread.c#L884-L887
>  is where the message comes from. Tracing that further, 
> {{have_background_thread}} comes from 
> https://github.com/jemalloc/jemalloc/blob/21cfe59ff7b10a61dabe26cd3dbfb7a255e1f5e8/include/jemalloc/internal/jemalloc_preamble.h.in#L205-L211,
>  which gets set in {{configure.ac}} here: 
> https://github.com/jemalloc/jemalloc/blob/d2dddfb82aac9f2212922eb90324e84790704bfe/configure.ac#L2155-L2157
> In sum, on my system, that flag doesn't get set, so 
> {{have_background_thread}} is false, and when that is false and the 
> {{background_thread}} option is true, I get that message printed. And I do 
> not want to see that message.
> cc [~wesm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6977) [C++] Only enable jemalloc background_thread if feature is supported

2019-10-23 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958304#comment-16958304
 ] 

Antoine Pitrou commented on ARROW-6977:
---

Also, it would be nice if you could give a bit of context? (why is the test 
doing? can you run them in verbose mode to see where it's crashing?)

> [C++] Only enable jemalloc background_thread if feature is supported
> 
>
> Key: ARROW-6977
> URL: https://issues.apache.org/jira/browse/ARROW-6977
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: macOS 10.14, Homebrew
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 1.0.0, 0.15.1
>
>
> Followup to ARROW-6910. When loading the R package after that patch merged, I 
> get this new message:
> {code}
> $ R
> > library(arrow)
> : option background_thread currently supports pthread only
> {code}
> https://github.com/jemalloc/jemalloc/blob/3d84bd57f4954a17059bd31330ec87d3c1876411/src/background_thread.c#L884-L887
>  is where the message comes from. Tracing that further, 
> {{have_background_thread}} comes from 
> https://github.com/jemalloc/jemalloc/blob/21cfe59ff7b10a61dabe26cd3dbfb7a255e1f5e8/include/jemalloc/internal/jemalloc_preamble.h.in#L205-L211,
>  which gets set in {{configure.ac}} here: 
> https://github.com/jemalloc/jemalloc/blob/d2dddfb82aac9f2212922eb90324e84790704bfe/configure.ac#L2155-L2157
> In sum, on my system, that flag doesn't get set, so 
> {{have_background_thread}} is false, and when that is false and the 
> {{background_thread}} option is true, I get that message printed. And I do 
> not want to see that message.
> cc [~wesm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-6968) [Python] 0.14.1 to 0.15.0 upgrade produces AttributeError

2019-10-23 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-6968.
---
Resolution: Won't Fix

> [Python] 0.14.1 to 0.15.0 upgrade produces AttributeError
> -
>
> Key: ARROW-6968
> URL: https://issues.apache.org/jira/browse/ARROW-6968
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.15.0
> Environment: Python 3.7.4 on macOS Mojave 10.14.6
> Python 3.6.7 on Ubuntu 16.04.6 LTS
>Reporter: Michael Wheeler
>Priority: Major
> Attachments: attribute_error_pyarrow_0_15_0.py
>
>
> The code in question:
> {code:java}
> """
> Reproduce AttributeError with PyArrow == 0.15.0
> """
> import io
> import logging
> import pandas
> import pyarrow
> import sys
> import textwrap
> logging.basicConfig(level=logging.DEBUG)
> logging.debug(f'Python 
> v{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}')
> logging.debug(f'PyArrow v{pyarrow.__version__}' + '\n')
> CSV_TEXT = textwrap.dedent("""\
>   id,gender,some_date,age
>   001,M,01/01/2019,75
>   002,F,02/02/2018,32
>   003,M,03/03/2017,27
>   004,F,04/04/2016,19
>   005,M,05/05/2015,55
>   006,F,06/06/2014,42
>   """)
> # Initialize pyarrow table via pandas
> mock_file = io.StringIO(CSV_TEXT)
> df = pandas.read_csv(mock_file).sort_values(['age', 'gender'])
> table = pyarrow.Table.from_pandas(df=df)
> # This comprehension generates a map between the name of the column and its 
> index
> map_col_names_to_incides = {item.name: table.columns.index(item) for item in 
> table.columns}
> logging.debug('The column indices are:')
> for name, index in map_col_names_to_incides.items():
> logging.debug(f'Col {name} -> #{index}')
> {code}
>  
> Expected result (generated with 0.14.0):
> {code:java}
> DEBUG:root:Python v3.7.4
> DEBUG:root:PyArrow v0.14.1
> DEBUG:root:The column indices are:
> DEBUG:root:Col id -> #0
> DEBUG:root:Col gender -> #1
> DEBUG:root:Col some_date -> #2
> DEBUG:root:Col age -> #3
> DEBUG:root:Col __index_level_0__ -> #4
> {code}
> Actual result (generated with 0.15.0):
> {code:java}
> DEBUG:root:Python v3.7.4
> DEBUG:root:PyArrow v0.15.0
> Traceback (most recent call last):
>   File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 
> 1758, in 
> main()
>   File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 
> 1752, in main
> globals = debugger.run(setup['file'], None, None, is_module)
>   File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 
> 1147, in run
> pydev_imports.execfile(file, globals, locals)  # execute the script
>   File 
> "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_imps/_pydev_execfile.py",
>  line 18, in execfile
> exec(compile(contents+"\n", file, 'exec'), glob, loc)
>   File 
> "/Users/mwheeler/Library/Preferences/PyCharm2019.1/scratches/scratch.py", 
> line 31, in 
> map_col_names_to_incides = {item.name: table.columns.index(item) for item 
> in table.columns}
>   File 
> "/Users/mwheeler/Library/Preferences/PyCharm2019.1/scratches/scratch.py", 
> line 31, in 
> map_col_names_to_incides = {item.name: table.columns.index(item) for item 
> in table.columns}
> AttributeError: 'pyarrow.lib.ChunkedArray' object has no attribute 'name'
> {code}
>  
> This error occurs in both of the environments specified above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6977) [C++] Only enable jemalloc background_thread if feature is supported

2019-10-23 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958302#comment-16958302
 ] 

Antoine Pitrou commented on ARROW-6977:
---

Could you post the backtrace for all threads? Something like "thread apply all 
bt" should do.

> [C++] Only enable jemalloc background_thread if feature is supported
> 
>
> Key: ARROW-6977
> URL: https://issues.apache.org/jira/browse/ARROW-6977
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: macOS 10.14, Homebrew
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 1.0.0, 0.15.1
>
>
> Followup to ARROW-6910. When loading the R package after that patch merged, I 
> get this new message:
> {code}
> $ R
> > library(arrow)
> : option background_thread currently supports pthread only
> {code}
> https://github.com/jemalloc/jemalloc/blob/3d84bd57f4954a17059bd31330ec87d3c1876411/src/background_thread.c#L884-L887
>  is where the message comes from. Tracing that further, 
> {{have_background_thread}} comes from 
> https://github.com/jemalloc/jemalloc/blob/21cfe59ff7b10a61dabe26cd3dbfb7a255e1f5e8/include/jemalloc/internal/jemalloc_preamble.h.in#L205-L211,
>  which gets set in {{configure.ac}} here: 
> https://github.com/jemalloc/jemalloc/blob/d2dddfb82aac9f2212922eb90324e84790704bfe/configure.ac#L2155-L2157
> In sum, on my system, that flag doesn't get set, so 
> {{have_background_thread}} is false, and when that is false and the 
> {{background_thread}} option is true, I get that message printed. And I do 
> not want to see that message.
> cc [~wesm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6977) [C++] Only enable jemalloc background_thread if feature is supported

2019-10-23 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958299#comment-16958299
 ] 

Neal Richardson commented on ARROW-6977:


With lldb:

{code}
Assertion failed: (ec == 0), function unlock, file 
/BuildRoot/Library/Caches/com.apple.xbs/Sources/libcxx/libcxx-400.9.4/src/mutex.cpp,
 line 48.
Process 36128 stopped
* thread #10, stop reason = signal SIGABRT
frame #0: 0x7fff729182c6 libsystem_kernel.dylib`__pthread_kill + 10
libsystem_kernel.dylib`__pthread_kill:
->  0x7fff729182c6 <+10>: jae0x7fff729182d0; <+20>
0x7fff729182c8 <+12>: movq   %rax, %rdi
0x7fff729182cb <+15>: jmp0x7fff72912457; cerror_nocancel
0x7fff729182d0 <+20>: retq   
Target 0: (R) stopped.
(lldb) bt
* thread #10, stop reason = signal SIGABRT
  * frame #0: 0x7fff729182c6 libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x7fff729cdbf1 libsystem_pthread.dylib`pthread_kill + 284
frame #2: 0x7fff728826a6 libsystem_c.dylib`abort + 127
frame #3: 0x7fff7284b20d libsystem_c.dylib`__assert_rtn + 324
frame #4: 0x7fff6f7e79a4 libc++.1.dylib`std::__1::mutex::unlock() + 46
frame #5: 0x00010a8d991a 
libarrow.100.dylib`std::__1::unique_lock::~unique_lock(this=0x7935db70)
 at __mutex_base:153:19
frame #6: 0x00010a8d9715 
libarrow.100.dylib`std::__1::unique_lock::~unique_lock(this=0x7935db70)
 at __mutex_base:151:5
frame #7: 0x00010a8d96c1 
libarrow.100.dylib`arrow::internal::ThreadedTaskGroup::OneTaskDone(this=0x000102373b00)
 at task_group.cc:152:5
frame #8: 0x00010a8dbe5f 
libarrow.100.dylib`arrow::internal::ThreadedTaskGroup::AppendReal(this=0x00010232cfe0)>)::'lambda'()::operator()()
 const at task_group.cc:97:9
frame #9: 0x00010a8dbd9d 
libarrow.100.dylib`decltype(__f=0x00010232cfe0)>)::'lambda'()&>(fp)()) 
std::__1::__invoke)::'lambda'()&>(arrow::internal::ThreadedTaskGroup::AppendReal(std::__1::function)::'lambda'()&) at type_traits:4361:1
frame #10: 0x00010a8dbd4d libarrow.100.dylib`void 
std::__1::__invoke_void_return_wrapper::__call)::'lambda'()&>(arrow::internal::ThreadedTaskGroup::AppendReal(std::__1::function)::'lambda'()&) at __functional_base:349:9
frame #11: 0x00010a8dbd1d 
libarrow.100.dylib`std::__1::__function::__alloc_func)::'lambda'(), 
std::__1::allocator)::'lambda'()>, void ()>::operator(this=0x00010232cfe0)() at 
functional:1527:16
frame #12: 0x00010a8daa59 
libarrow.100.dylib`std::__1::__function::__func)::'lambda'(), 
std::__1::allocator)::'lambda'()>, void ()>::operator(this=0x00010232cfd0)() at 
functional:1651:12
frame #13: 0x00010a8e5185 
libarrow.100.dylib`std::__1::__function::__value_func::operator(this=0x7935ddf0)() const at functional:1799:16
frame #14: 0x00010a8e4d35 libarrow.100.dylib`std::__1::function::operator(this=0x7935ddf0)() const at functional:2347:12
frame #15: 0x00010a8e46fa 
libarrow.100.dylib`arrow::internal::WorkerLoop(state=std::__1::shared_ptr::element_type
 @ 0x00010092e548 strong=17 weak=1, it=std::__1::list >::iterator @ 0x7935dde8) at 
thread_pool.cc:88:9
frame #16: 0x00010a8e4451 
libarrow.100.dylib`arrow::internal::ThreadPool::LaunchWorkersUnlocked(this=0x000100a1cde8)::$_1::operator()()
 const at thread_pool.cc:225:37
frame #17: 0x00010a8e43cd 
libarrow.100.dylib`decltype(__f=0x000100a1cde8)::$_1>(fp)()) 
std::__1::__invoke(arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_1&&)
 at type_traits:4361:1
frame #18: 0x00010a8e4335 libarrow.100.dylib`void 
std::__1::__thread_execute >, 
arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_1>(__t=size=2, 
(null)=__tuple_indices<> @ 0x7935deb8)::$_1>&, 
std::__1::__tuple_indices<>) at thread:342:5
frame #19: 0x00010a8e3b16 libarrow.100.dylib`void* 
std::__1::__thread_proxy >, 
arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_1> 
>(__vp=0x000100a1cde0) at thread:352:5
frame #20: 0x7fff729cb2eb libsystem_pthread.dylib`_pthread_body + 126
frame #21: 0x7fff729ce249 libsystem_pthread.dylib`_pthread_start + 66
frame #22: 0x7fff729ca40d libsystem_pthread.dylib`thread_start + 13
{code}

> [C++] Only enable jemalloc background_thread if feature is supported
> 
>
> Key: ARROW-6977
> URL: https://issues.apache.org/jira/browse/ARROW-6977
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: macOS 10.14, Homebrew
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 1.0.0, 0.15.1
>
>
> Followup to ARROW-6910. When loading the R package after that patch merged, I 
> get this new message:
> {code}
> $ R
> > library(arrow)
> : 

[jira] [Commented] (ARROW-6977) [C++] Only enable jemalloc background_thread if feature is supported

2019-10-23 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958295#comment-16958295
 ] 

Antoine Pitrou commented on ARROW-6977:
---

You can try running the executable using "gdb --args".

Another solution is to enable core dumps (perhaps "ulimit -c unlimited") and 
then run gdb on the core dump, like this: "gdb executable_file core_file".

Once under the gdb, use "run" to run the application and then "bt" to get a 
backtrace.  If debugging a core dump, you only need "bt".


> [C++] Only enable jemalloc background_thread if feature is supported
> 
>
> Key: ARROW-6977
> URL: https://issues.apache.org/jira/browse/ARROW-6977
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: macOS 10.14, Homebrew
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 1.0.0, 0.15.1
>
>
> Followup to ARROW-6910. When loading the R package after that patch merged, I 
> get this new message:
> {code}
> $ R
> > library(arrow)
> : option background_thread currently supports pthread only
> {code}
> https://github.com/jemalloc/jemalloc/blob/3d84bd57f4954a17059bd31330ec87d3c1876411/src/background_thread.c#L884-L887
>  is where the message comes from. Tracing that further, 
> {{have_background_thread}} comes from 
> https://github.com/jemalloc/jemalloc/blob/21cfe59ff7b10a61dabe26cd3dbfb7a255e1f5e8/include/jemalloc/internal/jemalloc_preamble.h.in#L205-L211,
>  which gets set in {{configure.ac}} here: 
> https://github.com/jemalloc/jemalloc/blob/d2dddfb82aac9f2212922eb90324e84790704bfe/configure.ac#L2155-L2157
> In sum, on my system, that flag doesn't get set, so 
> {{have_background_thread}} is false, and when that is false and the 
> {{background_thread}} option is true, I get that message printed. And I do 
> not want to see that message.
> cc [~wesm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6977) [C++] Only enable jemalloc background_thread if feature is supported

2019-10-23 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958294#comment-16958294
 ] 

Neal Richardson commented on ARROW-6977:


Update: I wiped my build dir and rebuilt with {{-DARROW_JEMALLOC=OFF}}. I no 
longer get the warning message, but I'm still able to trigger this abort. So it 
seems there's something new in master that triggers this, but it may not be 
jemalloc background thread.

> [C++] Only enable jemalloc background_thread if feature is supported
> 
>
> Key: ARROW-6977
> URL: https://issues.apache.org/jira/browse/ARROW-6977
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: macOS 10.14, Homebrew
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 1.0.0, 0.15.1
>
>
> Followup to ARROW-6910. When loading the R package after that patch merged, I 
> get this new message:
> {code}
> $ R
> > library(arrow)
> : option background_thread currently supports pthread only
> {code}
> https://github.com/jemalloc/jemalloc/blob/3d84bd57f4954a17059bd31330ec87d3c1876411/src/background_thread.c#L884-L887
>  is where the message comes from. Tracing that further, 
> {{have_background_thread}} comes from 
> https://github.com/jemalloc/jemalloc/blob/21cfe59ff7b10a61dabe26cd3dbfb7a255e1f5e8/include/jemalloc/internal/jemalloc_preamble.h.in#L205-L211,
>  which gets set in {{configure.ac}} here: 
> https://github.com/jemalloc/jemalloc/blob/d2dddfb82aac9f2212922eb90324e84790704bfe/configure.ac#L2155-L2157
> In sum, on my system, that flag doesn't get set, so 
> {{have_background_thread}} is false, and when that is false and the 
> {{background_thread}} option is true, I get that message printed. And I do 
> not want to see that message.
> cc [~wesm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6977) [C++] Only enable jemalloc background_thread if feature is supported

2019-10-23 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958290#comment-16958290
 ] 

Neal Richardson commented on ARROW-6977:


With some handholding, I'm sure I could.

> [C++] Only enable jemalloc background_thread if feature is supported
> 
>
> Key: ARROW-6977
> URL: https://issues.apache.org/jira/browse/ARROW-6977
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: macOS 10.14, Homebrew
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 1.0.0, 0.15.1
>
>
> Followup to ARROW-6910. When loading the R package after that patch merged, I 
> get this new message:
> {code}
> $ R
> > library(arrow)
> : option background_thread currently supports pthread only
> {code}
> https://github.com/jemalloc/jemalloc/blob/3d84bd57f4954a17059bd31330ec87d3c1876411/src/background_thread.c#L884-L887
>  is where the message comes from. Tracing that further, 
> {{have_background_thread}} comes from 
> https://github.com/jemalloc/jemalloc/blob/21cfe59ff7b10a61dabe26cd3dbfb7a255e1f5e8/include/jemalloc/internal/jemalloc_preamble.h.in#L205-L211,
>  which gets set in {{configure.ac}} here: 
> https://github.com/jemalloc/jemalloc/blob/d2dddfb82aac9f2212922eb90324e84790704bfe/configure.ac#L2155-L2157
> In sum, on my system, that flag doesn't get set, so 
> {{have_background_thread}} is false, and when that is false and the 
> {{background_thread}} option is true, I get that message printed. And I do 
> not want to see that message.
> cc [~wesm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6977) [C++] Only enable jemalloc background_thread if feature is supported

2019-10-23 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958288#comment-16958288
 ] 

Antoine Pitrou commented on ARROW-6977:
---

[~npr] can you produce a gdb backtrace for that error?

> [C++] Only enable jemalloc background_thread if feature is supported
> 
>
> Key: ARROW-6977
> URL: https://issues.apache.org/jira/browse/ARROW-6977
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: macOS 10.14, Homebrew
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 1.0.0, 0.15.1
>
>
> Followup to ARROW-6910. When loading the R package after that patch merged, I 
> get this new message:
> {code}
> $ R
> > library(arrow)
> : option background_thread currently supports pthread only
> {code}
> https://github.com/jemalloc/jemalloc/blob/3d84bd57f4954a17059bd31330ec87d3c1876411/src/background_thread.c#L884-L887
>  is where the message comes from. Tracing that further, 
> {{have_background_thread}} comes from 
> https://github.com/jemalloc/jemalloc/blob/21cfe59ff7b10a61dabe26cd3dbfb7a255e1f5e8/include/jemalloc/internal/jemalloc_preamble.h.in#L205-L211,
>  which gets set in {{configure.ac}} here: 
> https://github.com/jemalloc/jemalloc/blob/d2dddfb82aac9f2212922eb90324e84790704bfe/configure.ac#L2155-L2157
> In sum, on my system, that flag doesn't get set, so 
> {{have_background_thread}} is false, and when that is false and the 
> {{background_thread}} option is true, I get that message printed. And I do 
> not want to see that message.
> cc [~wesm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6977) [C++] Only enable jemalloc background_thread if feature is supported

2019-10-23 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958287#comment-16958287
 ] 

Antoine Pitrou commented on ARROW-6977:
---

The error message is misleading, it's about missing another system call. Unless 
we can find a reliable version check, disabling the background thread on macOS 
may be the safest course of action. [~uwe]


> [C++] Only enable jemalloc background_thread if feature is supported
> 
>
> Key: ARROW-6977
> URL: https://issues.apache.org/jira/browse/ARROW-6977
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: macOS 10.14, Homebrew
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 1.0.0, 0.15.1
>
>
> Followup to ARROW-6910. When loading the R package after that patch merged, I 
> get this new message:
> {code}
> $ R
> > library(arrow)
> : option background_thread currently supports pthread only
> {code}
> https://github.com/jemalloc/jemalloc/blob/3d84bd57f4954a17059bd31330ec87d3c1876411/src/background_thread.c#L884-L887
>  is where the message comes from. Tracing that further, 
> {{have_background_thread}} comes from 
> https://github.com/jemalloc/jemalloc/blob/21cfe59ff7b10a61dabe26cd3dbfb7a255e1f5e8/include/jemalloc/internal/jemalloc_preamble.h.in#L205-L211,
>  which gets set in {{configure.ac}} here: 
> https://github.com/jemalloc/jemalloc/blob/d2dddfb82aac9f2212922eb90324e84790704bfe/configure.ac#L2155-L2157
> In sum, on my system, that flag doesn't get set, so 
> {{have_background_thread}} is false, and when that is false and the 
> {{background_thread}} option is true, I get that message printed. And I do 
> not want to see that message.
> cc [~wesm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6982) [R] Add bindings for compare and boolean kernels

2019-10-23 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-6982:
--

 Summary: [R] Add bindings for compare and boolean kernels
 Key: ARROW-6982
 URL: https://issues.apache.org/jira/browse/ARROW-6982
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
Assignee: Romain Francois
 Fix For: 1.0.0


See cpp/src/arrow/compute/kernels/compare.h and boolean.h. ARROW-6980 
introduces an Expression class that works on Arrow Arrays, but to evaluate the 
expressions, it has to pull the data into R first. This would enable us to do 
the work in C++ and only pull in the result.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6960) [R] Add support for more compression codecs in Windows build

2019-10-23 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-6960:
---
Summary: [R] Add support for more compression codecs in Windows build  
(was: [R] Add information about zstd/lz4 codec installation and linkages for R 
users)

> [R] Add support for more compression codecs in Windows build
> 
>
> Key: ARROW-6960
> URL: https://issues.apache.org/jira/browse/ARROW-6960
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Affects Versions: 0.15.0
> Environment: Windows 10
>Reporter: Grant Nguyen
>Priority: Minor
>
> When I attempt to write a parquet file using lz4, zstd, or brotli compression 
> using R arrow 0.15.0, I am unable to do so due to the codec support not being 
> built (example below).
>  
> {code:java}
> > arrow::write_parquet(payout_strategy, sink = 
> > "records_test_lz4.parquet",compression = "lz4")
> Error in parquet___arrow___FileWriter__WriteTable(self, table, chunk_size) : 
>  Arrow error: IOError: Arrow error: NotImplemented: LZ4 codec support not 
> built{code}
>  
> I believe that the error is generated through 
> [https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/compression.cc#L124-L145],
>  but I am not sure how to call 
> {code:java}
> install.packages("arrow"){code}
> in R to enable the ARROW_WITH_ZSTD/LZ4/BROTLI flags, or whether I should be 
> doing installing zstd separately from arrow and then doing something pre- or 
> post-install to link zstd with arrow. From 
> [https://github.com/apache/arrow/issues/1209], it appears that zstd support 
> has been added to arrow and parquet in general, and the R package readme 
> ([https://github.com/apache/arrow/tree/master/r)|https://github.com/apache/arrow/tree/master/r]
>  notes "On macOS and Windows, installing a binary package from CRAN will 
> handle Arrow's C++ dependencies for you", but I get the sense that does not 
> apply to zstd.
>  
> Is there guidance as to how to enable zstd and other compression codecs prior 
> to or after downloading the R arrow package? Could this be added to the R 
> documentation somewhere for future reference?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6981) [R] Implement HDFS file-system interface in R

2019-10-23 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-6981:
--

 Summary: [R] Implement HDFS file-system interface in R
 Key: ARROW-6981
 URL: https://issues.apache.org/jira/browse/ARROW-6981
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-3750) [R] Pass various wrapped Arrow objects created in Python into R with zero copy via reticulate

2019-10-23 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-3750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958255#comment-16958255
 ] 

Neal Richardson commented on ARROW-3750:


https://github.com/pitrou/arrow/pull/5 is our proof-of-concept using the C API. 
Once the protocol is approved we can move ahead with it.

> [R] Pass various wrapped Arrow objects created in Python into R with zero 
> copy via reticulate
> -
>
> Key: ARROW-3750
> URL: https://issues.apache.org/jira/browse/ARROW-3750
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Wes McKinney
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> A user may wish to use some functionality available only in pyarrow using 
> reticulate; it would be useful to be able to construct an R wrapper object to 
> the C++ object inside the corresponding Python type, e.g. {{pyarrow.Table}}. 
> This probably will require some new functions to return the memory address of 
> the shared_ptr/unique_ptr inside the Cython types so that a function on the R 
> side can copy the smart pointer and create the corresponding R wrapper type
> cc [~pitrou]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6977) [C++] Only enable jemalloc background_thread if feature is supported

2019-10-23 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958253#comment-16958253
 ] 

Wes McKinney commented on ARROW-6977:
-

Not having pthread seems a bit weird to me, I'm not sure what that is all 
about? 

> [C++] Only enable jemalloc background_thread if feature is supported
> 
>
> Key: ARROW-6977
> URL: https://issues.apache.org/jira/browse/ARROW-6977
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: macOS 10.14, Homebrew
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 1.0.0, 0.15.1
>
>
> Followup to ARROW-6910. When loading the R package after that patch merged, I 
> get this new message:
> {code}
> $ R
> > library(arrow)
> : option background_thread currently supports pthread only
> {code}
> https://github.com/jemalloc/jemalloc/blob/3d84bd57f4954a17059bd31330ec87d3c1876411/src/background_thread.c#L884-L887
>  is where the message comes from. Tracing that further, 
> {{have_background_thread}} comes from 
> https://github.com/jemalloc/jemalloc/blob/21cfe59ff7b10a61dabe26cd3dbfb7a255e1f5e8/include/jemalloc/internal/jemalloc_preamble.h.in#L205-L211,
>  which gets set in {{configure.ac}} here: 
> https://github.com/jemalloc/jemalloc/blob/d2dddfb82aac9f2212922eb90324e84790704bfe/configure.ac#L2155-L2157
> In sum, on my system, that flag doesn't get set, so 
> {{have_background_thread}} is false, and when that is false and the 
> {{background_thread}} option is true, I get that message printed. And I do 
> not want to see that message.
> cc [~wesm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-3750) [R] Pass various wrapped Arrow objects created in Python into R with zero copy via reticulate

2019-10-23 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-3750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson reassigned ARROW-3750:
--

Assignee: Neal Richardson

> [R] Pass various wrapped Arrow objects created in Python into R with zero 
> copy via reticulate
> -
>
> Key: ARROW-3750
> URL: https://issues.apache.org/jira/browse/ARROW-3750
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Wes McKinney
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> A user may wish to use some functionality available only in pyarrow using 
> reticulate; it would be useful to be able to construct an R wrapper object to 
> the C++ object inside the corresponding Python type, e.g. {{pyarrow.Table}}. 
> This probably will require some new functions to return the memory address of 
> the shared_ptr/unique_ptr inside the Cython types so that a function on the R 
> side can copy the smart pointer and create the corresponding R wrapper type
> cc [~pitrou]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-3783) [R] Incorrect collection of float type

2019-10-23 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-3783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958252#comment-16958252
 ] 

Neal Richardson commented on ARROW-3783:


[~javierluraschi] is this still an issue? I don't have spark locally, but this 
works now:

{code}
> Array$create(1L, type=float32())
Array

[
  1
]
{code}

It looks like halffloat isn't supported, but that sounds like a different issue

{code}
> Array$create(1L, type=float16())
Error in Array__from_vector(x, type) : 
  NotImplemented: type not implemented
{code}

> [R] Incorrect collection of float type
> --
>
> Key: ARROW-3783
> URL: https://issues.apache.org/jira/browse/ARROW-3783
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Javier Luraschi
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Repro from `sparklyr`:
>  
> {code:java}
> library(sparklyr)
> library(arrow)
> sc <- spark_connect(master = "local")
> DBI::dbGetQuery(sc, "SELECT cast(1 as float)"){code}
>  
> Actual:
> {code:java}
>   CAST(1 AS FLOAT)
> 1   1065353216{code}
> Expected:
>  
> {code:java}
>   CAST(1 AS FLOAT)
> 11{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-3783) [R] Incorrect collection of float type

2019-10-23 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-3783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-3783:
---
Issue Type: Bug  (was: Improvement)

> [R] Incorrect collection of float type
> --
>
> Key: ARROW-3783
> URL: https://issues.apache.org/jira/browse/ARROW-3783
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Javier Luraschi
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Repro from `sparklyr`:
>  
> {code:java}
> library(sparklyr)
> library(arrow)
> sc <- spark_connect(master = "local")
> DBI::dbGetQuery(sc, "SELECT cast(1 as float)"){code}
>  
> Actual:
> {code:java}
>   CAST(1 AS FLOAT)
> 1   1065353216{code}
> Expected:
>  
> {code:java}
>   CAST(1 AS FLOAT)
> 11{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6980) [R] dplyr backend for RecordBatch/Table

2019-10-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6980:
--
Labels: pull-request-available  (was: )

> [R] dplyr backend for RecordBatch/Table
> ---
>
> Key: ARROW-6980
> URL: https://issues.apache.org/jira/browse/ARROW-6980
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6980) [R] dplyr backend for RecordBatch/Table

2019-10-23 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-6980:
--

 Summary: [R] dplyr backend for RecordBatch/Table
 Key: ARROW-6980
 URL: https://issues.apache.org/jira/browse/ARROW-6980
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6977) [C++] Only enable jemalloc background_thread if feature is supported

2019-10-23 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958219#comment-16958219
 ] 

Neal Richardson commented on ARROW-6977:


That said, I am experiencing this error occasionally today while running the R 
test suite locally. This is on master:

{code}
...
CsvTableReader: ..Assertion failed: (ec == 0), function unlock, file 
/BuildRoot/Library/Caches/com.apple.xbs/Sources/libcxx/libcxx-400.9.4/src/mutex.cpp,
 line 48.
/bin/sh: line 1: 59468 Abort trap: 6   R --slave -e 'library(testthat); 
setwd(file.path(.libPaths()[1], "arrow", "tests")); 
system.time(test_check("arrow", filter="", reporter=ifelse(nchar(""), "", 
"summary")))'
make: *** [test] Error 134
{code}

> [C++] Only enable jemalloc background_thread if feature is supported
> 
>
> Key: ARROW-6977
> URL: https://issues.apache.org/jira/browse/ARROW-6977
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: macOS 10.14, Homebrew
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 1.0.0, 0.15.1
>
>
> Followup to ARROW-6910. When loading the R package after that patch merged, I 
> get this new message:
> {code}
> $ R
> > library(arrow)
> : option background_thread currently supports pthread only
> {code}
> https://github.com/jemalloc/jemalloc/blob/3d84bd57f4954a17059bd31330ec87d3c1876411/src/background_thread.c#L884-L887
>  is where the message comes from. Tracing that further, 
> {{have_background_thread}} comes from 
> https://github.com/jemalloc/jemalloc/blob/21cfe59ff7b10a61dabe26cd3dbfb7a255e1f5e8/include/jemalloc/internal/jemalloc_preamble.h.in#L205-L211,
>  which gets set in {{configure.ac}} here: 
> https://github.com/jemalloc/jemalloc/blob/d2dddfb82aac9f2212922eb90324e84790704bfe/configure.ac#L2155-L2157
> In sum, on my system, that flag doesn't get set, so 
> {{have_background_thread}} is false, and when that is false and the 
> {{background_thread}} option is true, I get that message printed. And I do 
> not want to see that message.
> cc [~wesm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6977) [C++] Only enable jemalloc background_thread if feature is supported

2019-10-23 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958216#comment-16958216
 ] 

Antoine Pitrou commented on ARROW-6977:
---

Ah... can you run the tests fine? Also the C++ tests.

> [C++] Only enable jemalloc background_thread if feature is supported
> 
>
> Key: ARROW-6977
> URL: https://issues.apache.org/jira/browse/ARROW-6977
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: macOS 10.14, Homebrew
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 1.0.0, 0.15.1
>
>
> Followup to ARROW-6910. When loading the R package after that patch merged, I 
> get this new message:
> {code}
> $ R
> > library(arrow)
> : option background_thread currently supports pthread only
> {code}
> https://github.com/jemalloc/jemalloc/blob/3d84bd57f4954a17059bd31330ec87d3c1876411/src/background_thread.c#L884-L887
>  is where the message comes from. Tracing that further, 
> {{have_background_thread}} comes from 
> https://github.com/jemalloc/jemalloc/blob/21cfe59ff7b10a61dabe26cd3dbfb7a255e1f5e8/include/jemalloc/internal/jemalloc_preamble.h.in#L205-L211,
>  which gets set in {{configure.ac}} here: 
> https://github.com/jemalloc/jemalloc/blob/d2dddfb82aac9f2212922eb90324e84790704bfe/configure.ac#L2155-L2157
> In sum, on my system, that flag doesn't get set, so 
> {{have_background_thread}} is false, and when that is false and the 
> {{background_thread}} option is true, I get that message printed. And I do 
> not want to see that message.
> cc [~wesm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6977) [C++] Only enable jemalloc background_thread if feature is supported

2019-10-23 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958215#comment-16958215
 ] 

Neal Richardson commented on ARROW-6977:


It's just a message, it doesn't appear to error, at least not immediately.

> [C++] Only enable jemalloc background_thread if feature is supported
> 
>
> Key: ARROW-6977
> URL: https://issues.apache.org/jira/browse/ARROW-6977
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: macOS 10.14, Homebrew
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 1.0.0, 0.15.1
>
>
> Followup to ARROW-6910. When loading the R package after that patch merged, I 
> get this new message:
> {code}
> $ R
> > library(arrow)
> : option background_thread currently supports pthread only
> {code}
> https://github.com/jemalloc/jemalloc/blob/3d84bd57f4954a17059bd31330ec87d3c1876411/src/background_thread.c#L884-L887
>  is where the message comes from. Tracing that further, 
> {{have_background_thread}} comes from 
> https://github.com/jemalloc/jemalloc/blob/21cfe59ff7b10a61dabe26cd3dbfb7a255e1f5e8/include/jemalloc/internal/jemalloc_preamble.h.in#L205-L211,
>  which gets set in {{configure.ac}} here: 
> https://github.com/jemalloc/jemalloc/blob/d2dddfb82aac9f2212922eb90324e84790704bfe/configure.ac#L2155-L2157
> In sum, on my system, that flag doesn't get set, so 
> {{have_background_thread}} is false, and when that is false and the 
> {{background_thread}} option is true, I get that message printed. And I do 
> not want to see that message.
> cc [~wesm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-6977) [C++] Only enable jemalloc background_thread if feature is supported

2019-10-23 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958214#comment-16958214
 ] 

Antoine Pitrou edited comment on ARROW-6977 at 10/23/19 8:14 PM:
-

I wonder why this didn't come up on CI. Is macOS younger on Travis? [~kszucs] 


was (Author: pitrou):
I wonder why this didn't come up on CI. Is macOS younger on Travis? [~kszucs]

> [C++] Only enable jemalloc background_thread if feature is supported
> 
>
> Key: ARROW-6977
> URL: https://issues.apache.org/jira/browse/ARROW-6977
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: macOS 10.14, Homebrew
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 1.0.0, 0.15.1
>
>
> Followup to ARROW-6910. When loading the R package after that patch merged, I 
> get this new message:
> {code}
> $ R
> > library(arrow)
> : option background_thread currently supports pthread only
> {code}
> https://github.com/jemalloc/jemalloc/blob/3d84bd57f4954a17059bd31330ec87d3c1876411/src/background_thread.c#L884-L887
>  is where the message comes from. Tracing that further, 
> {{have_background_thread}} comes from 
> https://github.com/jemalloc/jemalloc/blob/21cfe59ff7b10a61dabe26cd3dbfb7a255e1f5e8/include/jemalloc/internal/jemalloc_preamble.h.in#L205-L211,
>  which gets set in {{configure.ac}} here: 
> https://github.com/jemalloc/jemalloc/blob/d2dddfb82aac9f2212922eb90324e84790704bfe/configure.ac#L2155-L2157
> In sum, on my system, that flag doesn't get set, so 
> {{have_background_thread}} is false, and when that is false and the 
> {{background_thread}} option is true, I get that message printed. And I do 
> not want to see that message.
> cc [~wesm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6977) [C++] Only enable jemalloc background_thread if feature is supported

2019-10-23 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-6977:
--
Fix Version/s: 0.15.1

> [C++] Only enable jemalloc background_thread if feature is supported
> 
>
> Key: ARROW-6977
> URL: https://issues.apache.org/jira/browse/ARROW-6977
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: macOS 10.14, Homebrew
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 1.0.0, 0.15.1
>
>
> Followup to ARROW-6910. When loading the R package after that patch merged, I 
> get this new message:
> {code}
> $ R
> > library(arrow)
> : option background_thread currently supports pthread only
> {code}
> https://github.com/jemalloc/jemalloc/blob/3d84bd57f4954a17059bd31330ec87d3c1876411/src/background_thread.c#L884-L887
>  is where the message comes from. Tracing that further, 
> {{have_background_thread}} comes from 
> https://github.com/jemalloc/jemalloc/blob/21cfe59ff7b10a61dabe26cd3dbfb7a255e1f5e8/include/jemalloc/internal/jemalloc_preamble.h.in#L205-L211,
>  which gets set in {{configure.ac}} here: 
> https://github.com/jemalloc/jemalloc/blob/d2dddfb82aac9f2212922eb90324e84790704bfe/configure.ac#L2155-L2157
> In sum, on my system, that flag doesn't get set, so 
> {{have_background_thread}} is false, and when that is false and the 
> {{background_thread}} option is true, I get that message printed. And I do 
> not want to see that message.
> cc [~wesm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6977) [C++] Only enable jemalloc background_thread if feature is supported

2019-10-23 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958214#comment-16958214
 ] 

Antoine Pitrou commented on ARROW-6977:
---

I wonder why this didn't come up on CI. Is macOS younger on Travis? [~kszucs]

> [C++] Only enable jemalloc background_thread if feature is supported
> 
>
> Key: ARROW-6977
> URL: https://issues.apache.org/jira/browse/ARROW-6977
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: macOS 10.14, Homebrew
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> Followup to ARROW-6910. When loading the R package after that patch merged, I 
> get this new message:
> {code}
> $ R
> > library(arrow)
> : option background_thread currently supports pthread only
> {code}
> https://github.com/jemalloc/jemalloc/blob/3d84bd57f4954a17059bd31330ec87d3c1876411/src/background_thread.c#L884-L887
>  is where the message comes from. Tracing that further, 
> {{have_background_thread}} comes from 
> https://github.com/jemalloc/jemalloc/blob/21cfe59ff7b10a61dabe26cd3dbfb7a255e1f5e8/include/jemalloc/internal/jemalloc_preamble.h.in#L205-L211,
>  which gets set in {{configure.ac}} here: 
> https://github.com/jemalloc/jemalloc/blob/d2dddfb82aac9f2212922eb90324e84790704bfe/configure.ac#L2155-L2157
> In sum, on my system, that flag doesn't get set, so 
> {{have_background_thread}} is false, and when that is false and the 
> {{background_thread}} option is true, I get that message printed. And I do 
> not want to see that message.
> cc [~wesm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6977) [C++] Only enable jemalloc background_thread if feature is supported

2019-10-23 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958212#comment-16958212
 ] 

Neal Richardson commented on ARROW-6977:


I'm not sure, but it sounds like the kind of thing we'll hear bug reports about 
if we don't. 

For the CRAN R packages, it's not an issue because the macOS and Windows 
binaries are built with jemalloc disabled:

* 
https://github.com/apache/arrow/blob/59a6788c76330cf055bdbcbc7bdae7b0106c6656/dev/tasks/homebrew-formulae/autobrew/apache-arrow.rb#L47
* 
https://github.com/apache/arrow/blob/59a6788c76330cf055bdbcbc7bdae7b0106c6656/ci/PKGBUILD#L85


> [C++] Only enable jemalloc background_thread if feature is supported
> 
>
> Key: ARROW-6977
> URL: https://issues.apache.org/jira/browse/ARROW-6977
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: macOS 10.14, Homebrew
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> Followup to ARROW-6910. When loading the R package after that patch merged, I 
> get this new message:
> {code}
> $ R
> > library(arrow)
> : option background_thread currently supports pthread only
> {code}
> https://github.com/jemalloc/jemalloc/blob/3d84bd57f4954a17059bd31330ec87d3c1876411/src/background_thread.c#L884-L887
>  is where the message comes from. Tracing that further, 
> {{have_background_thread}} comes from 
> https://github.com/jemalloc/jemalloc/blob/21cfe59ff7b10a61dabe26cd3dbfb7a255e1f5e8/include/jemalloc/internal/jemalloc_preamble.h.in#L205-L211,
>  which gets set in {{configure.ac}} here: 
> https://github.com/jemalloc/jemalloc/blob/d2dddfb82aac9f2212922eb90324e84790704bfe/configure.ac#L2155-L2157
> In sum, on my system, that flag doesn't get set, so 
> {{have_background_thread}} is false, and when that is false and the 
> {{background_thread}} option is true, I get that message printed. And I do 
> not want to see that message.
> cc [~wesm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6977) [C++] Only enable jemalloc background_thread if feature is supported

2019-10-23 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958205#comment-16958205
 ] 

Antoine Pitrou commented on ARROW-6977:
---

This sounds critical to get in for 0.15.1, right?

> [C++] Only enable jemalloc background_thread if feature is supported
> 
>
> Key: ARROW-6977
> URL: https://issues.apache.org/jira/browse/ARROW-6977
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: macOS 10.14, Homebrew
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> Followup to ARROW-6910. When loading the R package after that patch merged, I 
> get this new message:
> {code}
> $ R
> > library(arrow)
> : option background_thread currently supports pthread only
> {code}
> https://github.com/jemalloc/jemalloc/blob/3d84bd57f4954a17059bd31330ec87d3c1876411/src/background_thread.c#L884-L887
>  is where the message comes from. Tracing that further, 
> {{have_background_thread}} comes from 
> https://github.com/jemalloc/jemalloc/blob/21cfe59ff7b10a61dabe26cd3dbfb7a255e1f5e8/include/jemalloc/internal/jemalloc_preamble.h.in#L205-L211,
>  which gets set in {{configure.ac}} here: 
> https://github.com/jemalloc/jemalloc/blob/d2dddfb82aac9f2212922eb90324e84790704bfe/configure.ac#L2155-L2157
> In sum, on my system, that flag doesn't get set, so 
> {{have_background_thread}} is false, and when that is false and the 
> {{background_thread}} option is true, I get that message printed. And I do 
> not want to see that message.
> cc [~wesm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6979) [R] Enable jemalloc in autobrew formula

2019-10-23 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-6979:
--

 Summary: [R] Enable jemalloc in autobrew formula
 Key: ARROW-6979
 URL: https://issues.apache.org/jira/browse/ARROW-6979
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
 Fix For: 1.0.0


See 
https://github.com/apache/arrow/blob/59a6788c76330cf055bdbcbc7bdae7b0106c6656/dev/tasks/homebrew-formulae/autobrew/apache-arrow.rb#L47



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6964) [C++][Dataset] Expose a nested parellel option for Scanner::ToTable

2019-10-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6964:
--
Labels: dataset pull-request-available  (was: dataset)

> [C++][Dataset] Expose a nested parellel option for Scanner::ToTable
> ---
>
> Key: ARROW-6964
> URL: https://issues.apache.org/jira/browse/ARROW-6964
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Major
>  Labels: dataset, pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6978) [R] Add bindings for sum and mean compute kernels

2019-10-23 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-6978:
--

 Summary: [R] Add bindings for sum and mean compute kernels
 Key: ARROW-6978
 URL: https://issues.apache.org/jira/browse/ARROW-6978
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
Assignee: Romain Francois
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6977) [C++] Only enable jemalloc background_thread if feature is supported

2019-10-23 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-6977:
--

 Summary: [C++] Only enable jemalloc background_thread if feature 
is supported
 Key: ARROW-6977
 URL: https://issues.apache.org/jira/browse/ARROW-6977
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
 Environment: macOS 10.14, Homebrew
Reporter: Neal Richardson
 Fix For: 1.0.0


Followup to ARROW-6910. When loading the R package after that patch merged, I 
get this new message:

{code}
$ R
> library(arrow)
: option background_thread currently supports pthread only

{code}

https://github.com/jemalloc/jemalloc/blob/3d84bd57f4954a17059bd31330ec87d3c1876411/src/background_thread.c#L884-L887
 is where the message comes from. Tracing that further, 
{{have_background_thread}} comes from 
https://github.com/jemalloc/jemalloc/blob/21cfe59ff7b10a61dabe26cd3dbfb7a255e1f5e8/include/jemalloc/internal/jemalloc_preamble.h.in#L205-L211,
 which gets set in {{configure.ac}} here: 
https://github.com/jemalloc/jemalloc/blob/d2dddfb82aac9f2212922eb90324e84790704bfe/configure.ac#L2155-L2157

In sum, on my system, that flag doesn't get set, so {{have_background_thread}} 
is false, and when that is false and the {{background_thread}} option is true, 
I get that message printed. And I do not want to see that message.

cc [~wesm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6960) [R] Add information about zstd/lz4 codec installation and linkages for R users

2019-10-23 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958064#comment-16958064
 ] 

Neal Richardson commented on ARROW-6960:


Sounds good. After you work out the lz4, if you wanted to move on to zstd, you 
could start by copying 
https://github.com/msys2/MINGW-packages/blob/master/mingw-w64-zstd/PKGBUILD to 
rtools-packages (fork it and make a PR adding it). Appveyor will test it for 
you, and Jeroen can help you with the details. 
https://github.com/msys2/MINGW-packages/blob/master/mingw-w64-brotli/PKGBUILD 
exists too but looks a little more involved because you'd probably want to 
prune the python-specific build targets. 

> [R] Add information about zstd/lz4 codec installation and linkages for R users
> --
>
> Key: ARROW-6960
> URL: https://issues.apache.org/jira/browse/ARROW-6960
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Affects Versions: 0.15.0
> Environment: Windows 10
>Reporter: Grant Nguyen
>Priority: Minor
>
> When I attempt to write a parquet file using lz4, zstd, or brotli compression 
> using R arrow 0.15.0, I am unable to do so due to the codec support not being 
> built (example below).
>  
> {code:java}
> > arrow::write_parquet(payout_strategy, sink = 
> > "records_test_lz4.parquet",compression = "lz4")
> Error in parquet___arrow___FileWriter__WriteTable(self, table, chunk_size) : 
>  Arrow error: IOError: Arrow error: NotImplemented: LZ4 codec support not 
> built{code}
>  
> I believe that the error is generated through 
> [https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/compression.cc#L124-L145],
>  but I am not sure how to call 
> {code:java}
> install.packages("arrow"){code}
> in R to enable the ARROW_WITH_ZSTD/LZ4/BROTLI flags, or whether I should be 
> doing installing zstd separately from arrow and then doing something pre- or 
> post-install to link zstd with arrow. From 
> [https://github.com/apache/arrow/issues/1209], it appears that zstd support 
> has been added to arrow and parquet in general, and the R package readme 
> ([https://github.com/apache/arrow/tree/master/r)|https://github.com/apache/arrow/tree/master/r]
>  notes "On macOS and Windows, installing a binary package from CRAN will 
> handle Arrow's C++ dependencies for you", but I get the sense that does not 
> apply to zstd.
>  
> Is there guidance as to how to enable zstd and other compression codecs prior 
> to or after downloading the R arrow package? Could this be added to the R 
> documentation somewhere for future reference?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-6960) [R] Add information about zstd/lz4 codec installation and linkages for R users

2019-10-23 Thread Grant Nguyen (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958058#comment-16958058
 ] 

Grant Nguyen edited comment on ARROW-6960 at 10/23/19 5:33 PM:
---

Thanks [~npr] for the very detailed explanation, that helps a lot! I will look 
into this in the next few days – the lz4 addition to PKGBUILD seems like a good 
starting point – not sure that I have quite the level of expertise to add zstd 
and brotli to rtools but will investigate further.


was (Author: gngu):
Thanks [~npr] for the very detailed explanation, that helps a lot! I will look 
into this in the next few days – the lz4 seems like a good starting point – not 
sure that I have quite the level of expertise to add zstd and brotli to rtools 
but will investigate further.

> [R] Add information about zstd/lz4 codec installation and linkages for R users
> --
>
> Key: ARROW-6960
> URL: https://issues.apache.org/jira/browse/ARROW-6960
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Affects Versions: 0.15.0
> Environment: Windows 10
>Reporter: Grant Nguyen
>Priority: Minor
>
> When I attempt to write a parquet file using lz4, zstd, or brotli compression 
> using R arrow 0.15.0, I am unable to do so due to the codec support not being 
> built (example below).
>  
> {code:java}
> > arrow::write_parquet(payout_strategy, sink = 
> > "records_test_lz4.parquet",compression = "lz4")
> Error in parquet___arrow___FileWriter__WriteTable(self, table, chunk_size) : 
>  Arrow error: IOError: Arrow error: NotImplemented: LZ4 codec support not 
> built{code}
>  
> I believe that the error is generated through 
> [https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/compression.cc#L124-L145],
>  but I am not sure how to call 
> {code:java}
> install.packages("arrow"){code}
> in R to enable the ARROW_WITH_ZSTD/LZ4/BROTLI flags, or whether I should be 
> doing installing zstd separately from arrow and then doing something pre- or 
> post-install to link zstd with arrow. From 
> [https://github.com/apache/arrow/issues/1209], it appears that zstd support 
> has been added to arrow and parquet in general, and the R package readme 
> ([https://github.com/apache/arrow/tree/master/r)|https://github.com/apache/arrow/tree/master/r]
>  notes "On macOS and Windows, installing a binary package from CRAN will 
> handle Arrow's C++ dependencies for you", but I get the sense that does not 
> apply to zstd.
>  
> Is there guidance as to how to enable zstd and other compression codecs prior 
> to or after downloading the R arrow package? Could this be added to the R 
> documentation somewhere for future reference?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6960) [R] Add information about zstd/lz4 codec installation and linkages for R users

2019-10-23 Thread Grant Nguyen (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958058#comment-16958058
 ] 

Grant Nguyen commented on ARROW-6960:
-

Thanks [~npr] for the very detailed explanation, that helps a lot! I will look 
into this in the next few days – the lz4 seems like a good starting point – not 
sure that I have quite the level of expertise to add zstd and brotli to rtools 
but will investigate further.

> [R] Add information about zstd/lz4 codec installation and linkages for R users
> --
>
> Key: ARROW-6960
> URL: https://issues.apache.org/jira/browse/ARROW-6960
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Affects Versions: 0.15.0
> Environment: Windows 10
>Reporter: Grant Nguyen
>Priority: Minor
>
> When I attempt to write a parquet file using lz4, zstd, or brotli compression 
> using R arrow 0.15.0, I am unable to do so due to the codec support not being 
> built (example below).
>  
> {code:java}
> > arrow::write_parquet(payout_strategy, sink = 
> > "records_test_lz4.parquet",compression = "lz4")
> Error in parquet___arrow___FileWriter__WriteTable(self, table, chunk_size) : 
>  Arrow error: IOError: Arrow error: NotImplemented: LZ4 codec support not 
> built{code}
>  
> I believe that the error is generated through 
> [https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/compression.cc#L124-L145],
>  but I am not sure how to call 
> {code:java}
> install.packages("arrow"){code}
> in R to enable the ARROW_WITH_ZSTD/LZ4/BROTLI flags, or whether I should be 
> doing installing zstd separately from arrow and then doing something pre- or 
> post-install to link zstd with arrow. From 
> [https://github.com/apache/arrow/issues/1209], it appears that zstd support 
> has been added to arrow and parquet in general, and the R package readme 
> ([https://github.com/apache/arrow/tree/master/r)|https://github.com/apache/arrow/tree/master/r]
>  notes "On macOS and Windows, installing a binary package from CRAN will 
> handle Arrow's C++ dependencies for you", but I get the sense that does not 
> apply to zstd.
>  
> Is there guidance as to how to enable zstd and other compression codecs prior 
> to or after downloading the R arrow package? Could this be added to the R 
> documentation somewhere for future reference?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-6971) [Rust] Replace "RecordBatchReader" with "BatchIterator"

2019-10-23 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan closed ARROW-6971.
--
Fix Version/s: (was: 1.0.0)
   Resolution: Not A Bug

> [Rust] Replace "RecordBatchReader" with "BatchIterator"
> ---
>
> Key: ARROW-6971
> URL: https://issues.apache.org/jira/browse/ARROW-6971
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Affects Versions: 0.15.0
>Reporter: Paddy Horan
>Assignee: Paddy Horan
>Priority: Minor
>
> As part of the recent reader work we introduced 
> {code:java}
> // arrow::record_batch::RecordBatchReader{code}
> but in datafusion we have
> {code:java}
> // datafusion::physical_plan::BatchIterator
> {code}
> These two trait are almost identical (BatchIterator implements Send + Sync 
> whereas RecordBatchReader does not).  I propose we replace RecordBatchReader 
> with BatchIterator (i.e. move it to arrow as it's generally useful outside of 
> datafusion) and update parquet and data fusion accordingly.
> [~andygrove] [~liurenjie1024] do you see any issues with this? 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6971) [Rust] Replace "RecordBatchReader" with "BatchIterator"

2019-10-23 Thread Paddy Horan (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958012#comment-16958012
 ] 

Paddy Horan commented on ARROW-6971:


Ahh, ok then.

 

> [Rust] Replace "RecordBatchReader" with "BatchIterator"
> ---
>
> Key: ARROW-6971
> URL: https://issues.apache.org/jira/browse/ARROW-6971
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Affects Versions: 0.15.0
>Reporter: Paddy Horan
>Assignee: Paddy Horan
>Priority: Minor
> Fix For: 1.0.0
>
>
> As part of the recent reader work we introduced 
> {code:java}
> // arrow::record_batch::RecordBatchReader{code}
> but in datafusion we have
> {code:java}
> // datafusion::physical_plan::BatchIterator
> {code}
> These two trait are almost identical (BatchIterator implements Send + Sync 
> whereas RecordBatchReader does not).  I propose we replace RecordBatchReader 
> with BatchIterator (i.e. move it to arrow as it's generally useful outside of 
> datafusion) and update parquet and data fusion accordingly.
> [~andygrove] [~liurenjie1024] do you see any issues with this? 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-6969) [C++][Dataset] ParquetScanTask eagerly load file

2019-10-23 Thread Francois Saint-Jacques (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques reassigned ARROW-6969:
-

Assignee: Francois Saint-Jacques

> [C++][Dataset] ParquetScanTask eagerly load file 
> -
>
> Key: ARROW-6969
> URL: https://issues.apache.org/jira/browse/ARROW-6969
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Major
>  Labels: dataset
>
> The file content should only be read when invoking ParquetScanTask::Scan, not 
> on construction. This blocks reading in a true streaming fashion with memory 
> constraints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-6964) [C++][Dataset] Expose a nested parellel option for Scanner::ToTable

2019-10-23 Thread Francois Saint-Jacques (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques reassigned ARROW-6964:
-

Assignee: Francois Saint-Jacques

> [C++][Dataset] Expose a nested parellel option for Scanner::ToTable
> ---
>
> Key: ARROW-6964
> URL: https://issues.apache.org/jira/browse/ARROW-6964
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Major
>  Labels: dataset
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-6950) [C++][Dataset] Add example/benchmark for reading parquet files with dataset

2019-10-23 Thread Francois Saint-Jacques (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques reassigned ARROW-6950:
-

Assignee: Francois Saint-Jacques

> [C++][Dataset] Add example/benchmark for reading parquet files with dataset
> ---
>
> Key: ARROW-6950
> URL: https://issues.apache.org/jira/browse/ARROW-6950
> Project: Apache Arrow
>  Issue Type: Test
>  Components: C++
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Major
>  Labels: dataset, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Create an executable that load a directory with a known partition scheme with 
> a filter and a projection. This will be used as a baseline for future 
> performance improvement but also to show various feature of the dataset API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6976) Possible memory leak in pyarrow read_parquet

2019-10-23 Thread david cottrell (Jira)
david cottrell created ARROW-6976:
-

 Summary: Possible memory leak in pyarrow read_parquet
 Key: ARROW-6976
 URL: https://issues.apache.org/jira/browse/ARROW-6976
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.15.0
 Environment: linux ubuntu 18.04
Reporter: david cottrell
 Attachments: image-2019-10-23-16-17-20-739.png

 

Version and repro info in the gist below.

Not sure if I'm not understanding something from this 
[https://arrow.apache.org/blog/2019/02/05/python-string-memory-0.12/]

but there seems to be memory accumulation when that is exacerbated with higher 
arity objects like strings and dates (not datetimes).

 

I was not able to reproduce the issue on MacOS. Downgrading to 0.14.1 seemed to 
"fix" or lessen the problem.

 

[https://gist.github.com/cottrell/a3f95aa59408d87f925ec606d8783e62]

 

Let me know if this post should go elsewhere.

!image-2019-10-23-16-17-20-739.png!

 
{code:java}
 
{code}
 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6950) [C++][Dataset] Add example/benchmark for reading parquet files with dataset

2019-10-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6950:
--
Labels: dataset pull-request-available  (was: dataset)

> [C++][Dataset] Add example/benchmark for reading parquet files with dataset
> ---
>
> Key: ARROW-6950
> URL: https://issues.apache.org/jira/browse/ARROW-6950
> Project: Apache Arrow
>  Issue Type: Test
>  Components: C++
>Reporter: Francois Saint-Jacques
>Priority: Major
>  Labels: dataset, pull-request-available
>
> Create an executable that load a directory with a known partition scheme with 
> a filter and a projection. This will be used as a baseline for future 
> performance improvement but also to show various feature of the dataset API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6749) [Python] Conversion of non-ns timestamp array to numpy gives wrong values

2019-10-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6749:
--
Labels: pull-request-available  (was: )

> [Python] Conversion of non-ns timestamp array to numpy gives wrong values
> -
>
> Key: ARROW-6749
> URL: https://issues.apache.org/jira/browse/ARROW-6749
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Joris Van den Bossche
>Assignee: Joris Van den Bossche
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> In [25]: np_arr = np.arange("2012-01-01", "2012-01-06", int(1e6)*60*60*24, 
> dtype="datetime64[us]")   
>   
> In [26]: np_arr   
>   
>
> Out[26]: 
> array(['2012-01-01T00:00:00.00', '2012-01-02T00:00:00.00',
>'2012-01-03T00:00:00.00', '2012-01-04T00:00:00.00',
>'2012-01-05T00:00:00.00'], dtype='datetime64[us]')
> In [27]: arr = pa.array(np_arr)   
>   
>
> In [28]: arr  
>   
>
> Out[28]: 
> 
> [
>   2012-01-01 00:00:00.00,
>   2012-01-02 00:00:00.00,
>   2012-01-03 00:00:00.00,
>   2012-01-04 00:00:00.00,
>   2012-01-05 00:00:00.00
> ]
> In [29]: arr.type 
>   
>
> Out[29]: TimestampType(timestamp[us])
> In [30]: arr.to_numpy()   
>   
>
> Out[30]: 
> array(['1970-01-16T08:09:36.0', '1970-01-16T08:11:02.4',
>'1970-01-16T08:12:28.8', '1970-01-16T08:13:55.2',
>'1970-01-16T08:15:21.6'], dtype='datetime64[ns]')
> {code}
> So it seems to simply interpret the integer microsecond values as nanoseconds 
> when converting to numpy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6503) [C++] Add an argument of memory pool object to SparseTensorConverter

2019-10-23 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-6503.
---
Resolution: Fixed

Issue resolved by pull request 5707
[https://github.com/apache/arrow/pull/5707]

> [C++] Add an argument of memory pool object to SparseTensorConverter
> 
>
> Key: ARROW-6503
> URL: https://issues.apache.org/jira/browse/ARROW-6503
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Kenta Murata
>Assignee: Kenta Murata
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> According to the comment 
> https://github.com/apache/arrow/pull/5290#discussion_r322244745, we need to 
> have variants of some functions for supplying a memory pool object to 
> SparseTensorConverter function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6973) [C++][ThreadPool] Use perfect forwarding in Submit

2019-10-23 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-6973.
---
Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 5711
[https://github.com/apache/arrow/pull/5711]

> [C++][ThreadPool] Use perfect forwarding in Submit
> --
>
> Key: ARROW-6973
> URL: https://issues.apache.org/jira/browse/ARROW-6973
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Artem Alekseev
>Assignee: Artem Alekseev
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6975) [C++] Put make_unique in its own header

2019-10-23 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-6975:
-

 Summary: [C++] Put make_unique in its own header
 Key: ARROW-6975
 URL: https://issues.apache.org/jira/browse/ARROW-6975
 Project: Apache Arrow
  Issue Type: Wish
  Components: C++
Reporter: Antoine Pitrou


{{arrow/util/stl.h}} carries other stuff that is almost never necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-3543) [R] Better support for timestamp format and time zones in R

2019-10-23 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16957889#comment-16957889
 ] 

Wes McKinney commented on ARROW-3543:
-

There will be an update on JIRA when there is activity. 

> [R] Better support for timestamp format and time zones in R
> ---
>
> Key: ARROW-3543
> URL: https://issues.apache.org/jira/browse/ARROW-3543
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Olaf
>Priority: Major
> Fix For: 1.0.0
>
>
> See below for original description and reports. In sum, there is a mismatch 
> between how the C++ library and R interpret data without a timezone, and it 
> turns out that we're not passing the timezone to R if it is set in Arrow C++ 
> anyway. 
> The [C++ library 
> docs|http://arrow.apache.org/docs/cpp/api/datatype.html#_CPPv4N5arrow13TimestampTypeE]
>  say "If a timezone-aware field contains a recognized timezone, its values 
> may be localized to that locale upon display; the values of timezone-naive 
> fields must always be displayed “as is”, with no localization performed on 
> them." But R's print default, as well as the parsing default, is the current 
> time zone: 
> https://stat.ethz.ch/R-manual/R-devel/library/base/html/strptime.html
> The C++ library seems to parse timestamp strings that don't have timezone 
> information as if they are UTC, so when you read timezone-naive timestamps 
> from Arrow and print them in R, they are shifted to be localized to the 
> current timezone. If you print timestamp data from Arrow with 
> {{print(timestamp_var, tz="GMT")}} it would look as you expect.
> On further inspection, the [arrow-to-vector code for 
> timestamp|https://github.com/apache/arrow/blob/master/r/src/array_to_vector.cpp#L504-L514]
>  doesn't seem to consider time zone information even if it does exist. So we 
> don't have the means currently in R to display timestamp data faithfully, 
> whether or not it is timezone-aware.
> Among the tasks here:
> * Include the timezone attribute in the POSIXct R vector that gets created 
> from a timestamp Arrow array
> * Ensure that timezone-naive data from Arrow is printed in R "as is" with no 
> localization 
> -
> Original description:
> Hello the dream team,
> Pasting from [https://github.com/wesm/feather/issues/351]
> Thanks for this wonderful package. I was playing with feather and some 
> timestamps and I noticed some dangerous behavior. Maybe it is a bug.
> Consider this
>  
> {code:java}
> import pandas as pd
> import feather
> import numpy as np
> df = pd.DataFrame(
> {'string_time_utc' : [pd.to_datetime('2018-02-01 14:00:00.531'), 
> pd.to_datetime('2018-02-01 14:01:00.456'), pd.to_datetime('2018-03-05 
> 14:01:02.200')]}
> )
> df['timestamp_est'] = 
> pd.to_datetime(df.string_time_utc).dt.tz_localize('UTC').dt.tz_convert('US/Eastern').dt.tz_localize(None)
> df
>  Out[17]: 
>  string_time_utc timestamp_est
>  0 2018-02-01 14:00:00.531 2018-02-01 09:00:00.531
>  1 2018-02-01 14:01:00.456 2018-02-01 09:01:00.456
>  2 2018-03-05 14:01:02.200 2018-03-05 09:01:02.200
> {code}
> Here I create the corresponding `EST` timestamp of my original timestamps (in 
> `UTC` time).
> Now saving the dataframe to `csv` or to `feather` will generate two 
> completely different results.
>  
> {code:java}
> df.to_csv('P://testing.csv')
> df.to_feather('P://testing.feather')
> {code}
> Switching to R.
> Using the good old `csv` gives me something a bit annoying, but expected. R 
> thinks my timezone is `UTC` by default, and wrongly attached this timezone to 
> `timestamp_est`. No big deal, I can always use `with_tz` or even better: 
> import as character and process as timestamp while in R.
>  
> {code:java}
> > dataframe <- read_csv('P://testing.csv')
>  Parsed with column specification:
>  cols(
>  X1 = col_integer(),
>  string_time_utc = col_datetime(format = ""),
>  timestamp_est = col_datetime(format = "")
>  )
>  Warning message:
>  Missing column names filled in: 'X1' [1] 
>  > 
>  > dataframe %>% mutate(mytimezone = tz(timestamp_est))
> A tibble: 3 x 4
>  X1 string_time_utc timestamp_est 
> 
>  1 0 2018-02-01 14:00:00.530 2018-02-01 09:00:00.530
>  2 1 2018-02-01 14:01:00.456 2018-02-01 09:01:00.456
>  3 2 2018-03-05 14:01:02.200 2018-03-05 09:01:02.200
>  mytimezone
>   
>  1 UTC 
>  2 UTC 
>  3 UTC  {code}
> {code:java}
> #Now look at what happens with feather:
>  
>  > dataframe <- read_feather('P://testing.feather')
>  > 
>  > dataframe %>% mutate(mytimezone = tz(timestamp_est))
> A tibble: 3 x 3
>  string_time_utc timestamp_est mytimezone
> 
>  1 2018-02-01 09:00:00.531 2018-02-01 04:00:00.531 "" 
>  2 2018-02-01 09:01:00.456 2018-02-01 04:01:00.456 "" 
>  3 2018-03-05 09:01:02.200 2018-03-05 04:01:02.200 "" {code}
> My timestamps have been 

[jira] [Commented] (ARROW-3543) [R] Better support for timestamp format and time zones in R

2019-10-23 Thread Shannon C Lewis (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16957884#comment-16957884
 ] 

Shannon C Lewis commented on ARROW-3543:


Just checking in any updates on this?

> [R] Better support for timestamp format and time zones in R
> ---
>
> Key: ARROW-3543
> URL: https://issues.apache.org/jira/browse/ARROW-3543
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Olaf
>Priority: Major
> Fix For: 1.0.0
>
>
> See below for original description and reports. In sum, there is a mismatch 
> between how the C++ library and R interpret data without a timezone, and it 
> turns out that we're not passing the timezone to R if it is set in Arrow C++ 
> anyway. 
> The [C++ library 
> docs|http://arrow.apache.org/docs/cpp/api/datatype.html#_CPPv4N5arrow13TimestampTypeE]
>  say "If a timezone-aware field contains a recognized timezone, its values 
> may be localized to that locale upon display; the values of timezone-naive 
> fields must always be displayed “as is”, with no localization performed on 
> them." But R's print default, as well as the parsing default, is the current 
> time zone: 
> https://stat.ethz.ch/R-manual/R-devel/library/base/html/strptime.html
> The C++ library seems to parse timestamp strings that don't have timezone 
> information as if they are UTC, so when you read timezone-naive timestamps 
> from Arrow and print them in R, they are shifted to be localized to the 
> current timezone. If you print timestamp data from Arrow with 
> {{print(timestamp_var, tz="GMT")}} it would look as you expect.
> On further inspection, the [arrow-to-vector code for 
> timestamp|https://github.com/apache/arrow/blob/master/r/src/array_to_vector.cpp#L504-L514]
>  doesn't seem to consider time zone information even if it does exist. So we 
> don't have the means currently in R to display timestamp data faithfully, 
> whether or not it is timezone-aware.
> Among the tasks here:
> * Include the timezone attribute in the POSIXct R vector that gets created 
> from a timestamp Arrow array
> * Ensure that timezone-naive data from Arrow is printed in R "as is" with no 
> localization 
> -
> Original description:
> Hello the dream team,
> Pasting from [https://github.com/wesm/feather/issues/351]
> Thanks for this wonderful package. I was playing with feather and some 
> timestamps and I noticed some dangerous behavior. Maybe it is a bug.
> Consider this
>  
> {code:java}
> import pandas as pd
> import feather
> import numpy as np
> df = pd.DataFrame(
> {'string_time_utc' : [pd.to_datetime('2018-02-01 14:00:00.531'), 
> pd.to_datetime('2018-02-01 14:01:00.456'), pd.to_datetime('2018-03-05 
> 14:01:02.200')]}
> )
> df['timestamp_est'] = 
> pd.to_datetime(df.string_time_utc).dt.tz_localize('UTC').dt.tz_convert('US/Eastern').dt.tz_localize(None)
> df
>  Out[17]: 
>  string_time_utc timestamp_est
>  0 2018-02-01 14:00:00.531 2018-02-01 09:00:00.531
>  1 2018-02-01 14:01:00.456 2018-02-01 09:01:00.456
>  2 2018-03-05 14:01:02.200 2018-03-05 09:01:02.200
> {code}
> Here I create the corresponding `EST` timestamp of my original timestamps (in 
> `UTC` time).
> Now saving the dataframe to `csv` or to `feather` will generate two 
> completely different results.
>  
> {code:java}
> df.to_csv('P://testing.csv')
> df.to_feather('P://testing.feather')
> {code}
> Switching to R.
> Using the good old `csv` gives me something a bit annoying, but expected. R 
> thinks my timezone is `UTC` by default, and wrongly attached this timezone to 
> `timestamp_est`. No big deal, I can always use `with_tz` or even better: 
> import as character and process as timestamp while in R.
>  
> {code:java}
> > dataframe <- read_csv('P://testing.csv')
>  Parsed with column specification:
>  cols(
>  X1 = col_integer(),
>  string_time_utc = col_datetime(format = ""),
>  timestamp_est = col_datetime(format = "")
>  )
>  Warning message:
>  Missing column names filled in: 'X1' [1] 
>  > 
>  > dataframe %>% mutate(mytimezone = tz(timestamp_est))
> A tibble: 3 x 4
>  X1 string_time_utc timestamp_est 
> 
>  1 0 2018-02-01 14:00:00.530 2018-02-01 09:00:00.530
>  2 1 2018-02-01 14:01:00.456 2018-02-01 09:01:00.456
>  3 2 2018-03-05 14:01:02.200 2018-03-05 09:01:02.200
>  mytimezone
>   
>  1 UTC 
>  2 UTC 
>  3 UTC  {code}
> {code:java}
> #Now look at what happens with feather:
>  
>  > dataframe <- read_feather('P://testing.feather')
>  > 
>  > dataframe %>% mutate(mytimezone = tz(timestamp_est))
> A tibble: 3 x 3
>  string_time_utc timestamp_est mytimezone
> 
>  1 2018-02-01 09:00:00.531 2018-02-01 04:00:00.531 "" 
>  2 2018-02-01 09:01:00.456 2018-02-01 04:01:00.456 "" 
>  3 2018-03-05 09:01:02.200 2018-03-05 04:01:02.200 "" {code}
> My timestamps have been converted!!! pure 

[jira] [Created] (ARROW-6974) [C++] Implement Cast kernel for time-likes with ArrayDataVisitor pattern

2019-10-23 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-6974:


 Summary: [C++] Implement Cast kernel for time-likes with 
ArrayDataVisitor pattern
 Key: ARROW-6974
 URL: https://issues.apache.org/jira/browse/ARROW-6974
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Joris Van den Bossche


Currently, the casting for time-like data is done with the {{ShiftTime}} 
function. It _might_ be possible to simplify this with ArrayDataVisitor (to 
avoid looping / checking the bitmap).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6958) [Python] tutorial script for arrow in spark throws error

2019-10-23 Thread Joris Van den Bossche (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche updated ARROW-6958:
-
Summary: [Python] tutorial script for arrow in spark throws error  (was: 
[python] tutorial script for arrow in spark throws error)

> [Python] tutorial script for arrow in spark throws error
> 
>
> Key: ARROW-6958
> URL: https://issues.apache.org/jira/browse/ARROW-6958
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java, Python
>Affects Versions: 0.15.0
> Environment: Ubuntu v 18. Cluster spun up on google dataproc - see 
> startup for specs of cluster
>Reporter: Karl Svensson
>Priority: Major
>  Labels: newbie
> Attachments: arrow_error.txt, start-cluster-nl.ps1.txt
>
>
> Running the arrow example for pyspark ([found here 
> |[https://github.com/apache/spark/blob/master/examples/src/main/python/sql/arrow.py]])
>  causes a java.lang.IllegalArgumentException error. Running the same script 
> with pyarrow v 0.8.0 causes the script to run correctly.
> Attached are the startup settings in google dataproc I'm using to create the 
> cluster, as well as the output (with error text). It isn't immediately 
> obvious to me what is causing the issue.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-6958) [Python] tutorial script for arrow in spark throws error

2019-10-23 Thread Joris Van den Bossche (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche closed ARROW-6958.

Resolution: Not A Problem

> [Python] tutorial script for arrow in spark throws error
> 
>
> Key: ARROW-6958
> URL: https://issues.apache.org/jira/browse/ARROW-6958
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java, Python
>Affects Versions: 0.15.0
> Environment: Ubuntu v 18. Cluster spun up on google dataproc - see 
> startup for specs of cluster
>Reporter: Karl Svensson
>Priority: Major
>  Labels: newbie
> Attachments: arrow_error.txt, start-cluster-nl.ps1.txt
>
>
> Running the arrow example for pyspark ([found here 
> |[https://github.com/apache/spark/blob/master/examples/src/main/python/sql/arrow.py]])
>  causes a java.lang.IllegalArgumentException error. Running the same script 
> with pyarrow v 0.8.0 causes the script to run correctly.
> Attached are the startup settings in google dataproc I'm using to create the 
> cluster, as well as the output (with error text). It isn't immediately 
> obvious to me what is causing the issue.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6958) [python] tutorial script for arrow in spark throws error

2019-10-23 Thread Joris Van den Bossche (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche updated ARROW-6958:
-
Fix Version/s: (was: 0.8.0)

> [python] tutorial script for arrow in spark throws error
> 
>
> Key: ARROW-6958
> URL: https://issues.apache.org/jira/browse/ARROW-6958
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java, Python
>Affects Versions: 0.15.0
> Environment: Ubuntu v 18. Cluster spun up on google dataproc - see 
> startup for specs of cluster
>Reporter: Karl Svensson
>Priority: Major
>  Labels: newbie
> Attachments: arrow_error.txt, start-cluster-nl.ps1.txt
>
>
> Running the arrow example for pyspark ([found here 
> |[https://github.com/apache/spark/blob/master/examples/src/main/python/sql/arrow.py]])
>  causes a java.lang.IllegalArgumentException error. Running the same script 
> with pyarrow v 0.8.0 causes the script to run correctly.
> Attached are the startup settings in google dataproc I'm using to create the 
> cluster, as well as the output (with error text). It isn't immediately 
> obvious to me what is causing the issue.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (ARROW-6958) [python] tutorial script for arrow in spark throws error

2019-10-23 Thread Joris Van den Bossche (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche reopened ARROW-6958:
--

> [python] tutorial script for arrow in spark throws error
> 
>
> Key: ARROW-6958
> URL: https://issues.apache.org/jira/browse/ARROW-6958
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java, Python
>Affects Versions: 0.15.0
> Environment: Ubuntu v 18. Cluster spun up on google dataproc - see 
> startup for specs of cluster
>Reporter: Karl Svensson
>Priority: Major
>  Labels: newbie
> Fix For: 0.8.0
>
> Attachments: arrow_error.txt, start-cluster-nl.ps1.txt
>
>
> Running the arrow example for pyspark ([found here 
> |[https://github.com/apache/spark/blob/master/examples/src/main/python/sql/arrow.py]])
>  causes a java.lang.IllegalArgumentException error. Running the same script 
> with pyarrow v 0.8.0 causes the script to run correctly.
> Attached are the startup settings in google dataproc I'm using to create the 
> cluster, as well as the output (with error text). It isn't immediately 
> obvious to me what is causing the issue.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6958) [python] tutorial script for arrow in spark throws error

2019-10-23 Thread Joris Van den Bossche (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16957749#comment-16957749
 ] 

Joris Van den Bossche commented on ARROW-6958:
--

The relevant spark issue is https://issues.apache.org/jira/browse/SPARK-29367

> [python] tutorial script for arrow in spark throws error
> 
>
> Key: ARROW-6958
> URL: https://issues.apache.org/jira/browse/ARROW-6958
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java, Python
>Affects Versions: 0.15.0
> Environment: Ubuntu v 18. Cluster spun up on google dataproc - see 
> startup for specs of cluster
>Reporter: Karl Svensson
>Priority: Major
>  Labels: newbie
> Fix For: 0.8.0
>
> Attachments: arrow_error.txt, start-cluster-nl.ps1.txt
>
>
> Running the arrow example for pyspark ([found here 
> |[https://github.com/apache/spark/blob/master/examples/src/main/python/sql/arrow.py]])
>  causes a java.lang.IllegalArgumentException error. Running the same script 
> with pyarrow v 0.8.0 causes the script to run correctly.
> Attached are the startup settings in google dataproc I'm using to create the 
> cluster, as well as the output (with error text). It isn't immediately 
> obvious to me what is causing the issue.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-6958) [python] tutorial script for arrow in spark throws error

2019-10-23 Thread Joris Van den Bossche (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche closed ARROW-6958.

Resolution: Not A Problem

> [python] tutorial script for arrow in spark throws error
> 
>
> Key: ARROW-6958
> URL: https://issues.apache.org/jira/browse/ARROW-6958
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java, Python
>Affects Versions: 0.15.0
> Environment: Ubuntu v 18. Cluster spun up on google dataproc - see 
> startup for specs of cluster
>Reporter: Karl Svensson
>Priority: Major
>  Labels: newbie
> Fix For: 0.8.0
>
> Attachments: arrow_error.txt, start-cluster-nl.ps1.txt
>
>
> Running the arrow example for pyspark ([found here 
> |[https://github.com/apache/spark/blob/master/examples/src/main/python/sql/arrow.py]])
>  causes a java.lang.IllegalArgumentException error. Running the same script 
> with pyarrow v 0.8.0 causes the script to run correctly.
> Attached are the startup settings in google dataproc I'm using to create the 
> cluster, as well as the output (with error text). It isn't immediately 
> obvious to me what is causing the issue.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-6958) [python] tutorial script for arrow in spark throws error

2019-10-23 Thread Joris Van den Bossche (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16957745#comment-16957745
 ] 

Joris Van den Bossche edited comment on ARROW-6958 at 10/23/19 10:43 AM:
-

pyspark is not yet compatible with the latest pyarrow 0.15.0 version, see eg 
https://stackoverflow.com/questions/58273063/pandasudf-and-pyarrow-0-15-0 for 
an explanation and how to solve it. I suppose you are encountering the same 
issue.


was (Author: jorisvandenbossche):
pyspark is not yet compatible with the latest pyarrow 0.15.0 version, see eg 
https://stackoverflow.com/questions/58273063/pandasudf-and-pyarrow-0-15-0 for 
an explanation

> [python] tutorial script for arrow in spark throws error
> 
>
> Key: ARROW-6958
> URL: https://issues.apache.org/jira/browse/ARROW-6958
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java, Python
>Affects Versions: 0.15.0
> Environment: Ubuntu v 18. Cluster spun up on google dataproc - see 
> startup for specs of cluster
>Reporter: Karl Svensson
>Priority: Major
>  Labels: newbie
> Fix For: 0.8.0
>
> Attachments: arrow_error.txt, start-cluster-nl.ps1.txt
>
>
> Running the arrow example for pyspark ([found here 
> |[https://github.com/apache/spark/blob/master/examples/src/main/python/sql/arrow.py]])
>  causes a java.lang.IllegalArgumentException error. Running the same script 
> with pyarrow v 0.8.0 causes the script to run correctly.
> Attached are the startup settings in google dataproc I'm using to create the 
> cluster, as well as the output (with error text). It isn't immediately 
> obvious to me what is causing the issue.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6958) [python] tutorial script for arrow in spark throws error

2019-10-23 Thread Joris Van den Bossche (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16957745#comment-16957745
 ] 

Joris Van den Bossche commented on ARROW-6958:
--

pyspark is not yet compatible with the latest pyarrow 0.15.0 version, see eg 
https://stackoverflow.com/questions/58273063/pandasudf-and-pyarrow-0-15-0 for 
an explanation

> [python] tutorial script for arrow in spark throws error
> 
>
> Key: ARROW-6958
> URL: https://issues.apache.org/jira/browse/ARROW-6958
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java, Python
>Affects Versions: 0.15.0
> Environment: Ubuntu v 18. Cluster spun up on google dataproc - see 
> startup for specs of cluster
>Reporter: Karl Svensson
>Priority: Major
>  Labels: newbie
> Fix For: 0.8.0
>
> Attachments: arrow_error.txt, start-cluster-nl.ps1.txt
>
>
> Running the arrow example for pyspark ([found here 
> |[https://github.com/apache/spark/blob/master/examples/src/main/python/sql/arrow.py]])
>  causes a java.lang.IllegalArgumentException error. Running the same script 
> with pyarrow v 0.8.0 causes the script to run correctly.
> Attached are the startup settings in google dataproc I'm using to create the 
> cluster, as well as the output (with error text). It isn't immediately 
> obvious to me what is causing the issue.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6968) [Python] 0.14.1 to 0.15.0 upgrade produces AttributeError

2019-10-23 Thread Joris Van den Bossche (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16957733#comment-16957733
 ] 

Joris Van den Bossche commented on ARROW-6968:
--

Hi [~mwheeler-hdai], this was a backwards incompatible change in pyarrow 
0.15.0. The {{Column}} class (as small wrapper around ChunkedArray) is removed, 
and a column of a Table is now returned as a {{ChunkedArray}}. In most cases a 
{{ChunkedArray}} behaves similarly and has similar functionality as a the 
removed {{Column}}, but one of the differences is that {{ChunkedArray}} has no 
'name' attribute.

You could replace the

{code}
map_col_names_to_incides = {item.name: table.columns.index(item) for item in 
table.columns}
{code}

with eg

{code}
map_col_names_to_incides = {name: i for i, name in 
enumerate(table.column_names)} 
{code}

as the column_names are guaranteed to be in the correct order (or another 
option: {{dict(zip(table.column_names, range(table.num_columns)))}}).




> [Python] 0.14.1 to 0.15.0 upgrade produces AttributeError
> -
>
> Key: ARROW-6968
> URL: https://issues.apache.org/jira/browse/ARROW-6968
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.15.0
> Environment: Python 3.7.4 on macOS Mojave 10.14.6
> Python 3.6.7 on Ubuntu 16.04.6 LTS
>Reporter: Michael Wheeler
>Priority: Major
> Attachments: attribute_error_pyarrow_0_15_0.py
>
>
> The code in question:
> {code:java}
> """
> Reproduce AttributeError with PyArrow == 0.15.0
> """
> import io
> import logging
> import pandas
> import pyarrow
> import sys
> import textwrap
> logging.basicConfig(level=logging.DEBUG)
> logging.debug(f'Python 
> v{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}')
> logging.debug(f'PyArrow v{pyarrow.__version__}' + '\n')
> CSV_TEXT = textwrap.dedent("""\
>   id,gender,some_date,age
>   001,M,01/01/2019,75
>   002,F,02/02/2018,32
>   003,M,03/03/2017,27
>   004,F,04/04/2016,19
>   005,M,05/05/2015,55
>   006,F,06/06/2014,42
>   """)
> # Initialize pyarrow table via pandas
> mock_file = io.StringIO(CSV_TEXT)
> df = pandas.read_csv(mock_file).sort_values(['age', 'gender'])
> table = pyarrow.Table.from_pandas(df=df)
> # This comprehension generates a map between the name of the column and its 
> index
> map_col_names_to_incides = {item.name: table.columns.index(item) for item in 
> table.columns}
> logging.debug('The column indices are:')
> for name, index in map_col_names_to_incides.items():
> logging.debug(f'Col {name} -> #{index}')
> {code}
>  
> Expected result (generated with 0.14.0):
> {code:java}
> DEBUG:root:Python v3.7.4
> DEBUG:root:PyArrow v0.14.1
> DEBUG:root:The column indices are:
> DEBUG:root:Col id -> #0
> DEBUG:root:Col gender -> #1
> DEBUG:root:Col some_date -> #2
> DEBUG:root:Col age -> #3
> DEBUG:root:Col __index_level_0__ -> #4
> {code}
> Actual result (generated with 0.15.0):
> {code:java}
> DEBUG:root:Python v3.7.4
> DEBUG:root:PyArrow v0.15.0
> Traceback (most recent call last):
>   File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 
> 1758, in 
> main()
>   File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 
> 1752, in main
> globals = debugger.run(setup['file'], None, None, is_module)
>   File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 
> 1147, in run
> pydev_imports.execfile(file, globals, locals)  # execute the script
>   File 
> "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_imps/_pydev_execfile.py",
>  line 18, in execfile
> exec(compile(contents+"\n", file, 'exec'), glob, loc)
>   File 
> "/Users/mwheeler/Library/Preferences/PyCharm2019.1/scratches/scratch.py", 
> line 31, in 
> map_col_names_to_incides = {item.name: table.columns.index(item) for item 
> in table.columns}
>   File 
> "/Users/mwheeler/Library/Preferences/PyCharm2019.1/scratches/scratch.py", 
> line 31, in 
> map_col_names_to_incides = {item.name: table.columns.index(item) for item 
> in table.columns}
> AttributeError: 'pyarrow.lib.ChunkedArray' object has no attribute 'name'
> {code}
>  
> This error occurs in both of the environments specified above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6973) [C++][ThreadPool] Use perfect forwarding in Submit

2019-10-23 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-6973:
--
Component/s: C++

> [C++][ThreadPool] Use perfect forwarding in Submit
> --
>
> Key: ARROW-6973
> URL: https://issues.apache.org/jira/browse/ARROW-6973
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Artem Alekseev
>Assignee: Artem Alekseev
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6973) [C++][ThreadPool] Use perfect forwarding in Submit

2019-10-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6973:
--
Labels: pull-request-available  (was: )

> [C++][ThreadPool] Use perfect forwarding in Submit
> --
>
> Key: ARROW-6973
> URL: https://issues.apache.org/jira/browse/ARROW-6973
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Artem Alekseev
>Assignee: Artem Alekseev
>Priority: Trivial
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-6749) [Python] Conversion of non-ns timestamp array to numpy gives wrong values

2019-10-23 Thread Joris Van den Bossche (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche reassigned ARROW-6749:


Assignee: Joris Van den Bossche

> [Python] Conversion of non-ns timestamp array to numpy gives wrong values
> -
>
> Key: ARROW-6749
> URL: https://issues.apache.org/jira/browse/ARROW-6749
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Joris Van den Bossche
>Assignee: Joris Van den Bossche
>Priority: Major
>
> {code}
> In [25]: np_arr = np.arange("2012-01-01", "2012-01-06", int(1e6)*60*60*24, 
> dtype="datetime64[us]")   
>   
> In [26]: np_arr   
>   
>
> Out[26]: 
> array(['2012-01-01T00:00:00.00', '2012-01-02T00:00:00.00',
>'2012-01-03T00:00:00.00', '2012-01-04T00:00:00.00',
>'2012-01-05T00:00:00.00'], dtype='datetime64[us]')
> In [27]: arr = pa.array(np_arr)   
>   
>
> In [28]: arr  
>   
>
> Out[28]: 
> 
> [
>   2012-01-01 00:00:00.00,
>   2012-01-02 00:00:00.00,
>   2012-01-03 00:00:00.00,
>   2012-01-04 00:00:00.00,
>   2012-01-05 00:00:00.00
> ]
> In [29]: arr.type 
>   
>
> Out[29]: TimestampType(timestamp[us])
> In [30]: arr.to_numpy()   
>   
>
> Out[30]: 
> array(['1970-01-16T08:09:36.0', '1970-01-16T08:11:02.4',
>'1970-01-16T08:12:28.8', '1970-01-16T08:13:55.2',
>'1970-01-16T08:15:21.6'], dtype='datetime64[ns]')
> {code}
> So it seems to simply interpret the integer microsecond values as nanoseconds 
> when converting to numpy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6973) [C++][ThreadPool] Use perfect forwarding in Submit

2019-10-23 Thread Artem Alekseev (Jira)
Artem Alekseev created ARROW-6973:
-

 Summary: [C++][ThreadPool] Use perfect forwarding in Submit
 Key: ARROW-6973
 URL: https://issues.apache.org/jira/browse/ARROW-6973
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Artem Alekseev
Assignee: Artem Alekseev






--
This message was sent by Atlassian Jira
(v8.3.4#803005)