[jira] [Updated] (ARROW-18337) [R] Possible undesirable handling of POSIXlt objects
[ https://issues.apache.org/jira/browse/ARROW-18337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18337: --- Labels: pull-request-available (was: ) > [R] Possible undesirable handling of POSIXlt objects > > > Key: ARROW-18337 > URL: https://issues.apache.org/jira/browse/ARROW-18337 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Danielle Navarro >Priority: Major > Labels: pull-request-available > Fix For: 11.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > In the course of updating documentation, I noticed that it is possible to > create an Arrow array of POSIXlt objects from R, but not a scalar. > https://github.com/apache/arrow/pull/14514#discussion_r1016078081 > This works: > {code:r} > tm <- as.POSIXlt(c(Sys.time(), Sys.time())) > arrow::Array$create(tm) > {code} > This fails: > {code:r} > arrow::Scalar$create(as.POSIXlt(Sys.time())) > {code} > It's possible to manually convert a POSIXlt object to a struct scalar like > this: > {code:r} > df <- as.data.frame(unclass(as.POSIXlt(Sys.time( > arrow::Scalar$create(df, > type = struct( > sec = float32(), > min = int32(), > hour = int32(), > mday = int32(), > mon = int32(), > year = int32(), > wday = int32(), > yday = int32(), > isdst = int32(), > zone = utf8(), > gmtoff = int32() > )) > {code} > although this does not seem precisely the same as the behaviour of > Array$create() which creates an extension type? > It was unclear to us ([~thisisnic] and myself) whether the current behaviour > was desirable, so it seemed sensible to open an issue! > Related issue: > https://issues.apache.org/jira/browse/ARROW-18263 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-16480) [R] Update read_csv_arrow and open_dataset parse_options, read_options, and convert_options to take lists
[ https://issues.apache.org/jira/browse/ARROW-16480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-16480: --- Labels: good-first-issue good-second-issue pull-request-available (was: good-first-issue good-second-issue) > [R] Update read_csv_arrow and open_dataset parse_options, read_options, and > convert_options to take lists > - > > Key: ARROW-16480 > URL: https://issues.apache.org/jira/browse/ARROW-16480 > Project: Apache Arrow > Issue Type: Sub-task > Components: R >Reporter: Nicola Crane >Assignee: Nicola Crane >Priority: Major > Labels: good-first-issue, good-second-issue, > pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > From a discussion on a PR which documents the encoding argument > ([https://github.com/apache/arrow/pull/13038)] > Currently if we want to specify Arrow-specific read options such as encoding, > we'd have to do something like this: > {code:java} > df <- read_csv_arrow(tf, read_options = CsvReadOptions$create(encoding = > "utf8")) {code} > However, this uses a lower-level API that we don't want to include in the > examples for end-users to see. > > We should update the code inside {{read_csv_arrow()}} so that the user can > specify {{read_options}} as a list which we then pass through to > CsvReadOptions internally, so we could instead call the much more > user-friendly code below: > {code:java} > df <- read_csv_arrow(tf, read_options = list(encoding = "utf8")) {code} > We should then add an example of this to the function doc examples. > > We also should do the same for parse_options and convert_options. > Similarly, we can do: > {code:r} > open_dataset("data.csv", format = "csv", convert_options = > CsvConvertOptions$create(null_values = "Not Range", strings_can_be_null = > TRUE))%>% collect() > {code} > but it'd be great to be able to do: > {code:r} > open_dataset("data.csv", format = "csv", convert_options = list(null_values = > "Not Range", strings_can_be_null = TRUE))%>% collect() > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18377) MIGRATION: Automate component labels from issue form content
[ https://issues.apache.org/jira/browse/ARROW-18377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18377: --- Labels: gh-migration pull-request-available (was: gh-migration) > MIGRATION: Automate component labels from issue form content > > > Key: ARROW-18377 > URL: https://issues.apache.org/jira/browse/ARROW-18377 > Project: Apache Arrow > Issue Type: Task >Reporter: Todd Farmer >Assignee: Jacob Wujciak >Priority: Major > Labels: gh-migration, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > ARROW-18364 added the ability to report issues in GitHub, and includes GitHub > issue templates with a drop-down component(s) selector. These form elements > drive resulting issue markdown only, and cannot dynamically drive issue > labels. This requires GitHub actions, which also have a few limitations. > First, the issue form does not produce any structured data, it only produces > the issue description markdown, so a parser is required. Second, ASF > restricts GitHub actions to a selection of approved actions. It is likely > that while community actions exist to generate structured data from issue > forms, the Apache Arrow project will need to write its own parser and label > application action. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-16795) [C#][Flight] Nightly verify-rc-source-csharp-macos-arm64 fails
[ https://issues.apache.org/jira/browse/ARROW-16795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-16795: --- Labels: pull-request-available (was: ) > [C#][Flight] Nightly verify-rc-source-csharp-macos-arm64 fails > -- > > Key: ARROW-16795 > URL: https://issues.apache.org/jira/browse/ARROW-16795 > Project: Apache Arrow > Issue Type: Bug > Components: C#, FlightRPC >Affects Versions: 9.0.0 >Reporter: Raúl Cumplido >Priority: Critical > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The "verify-rc-source-csharp-macos-arm64" job has been failing on and off > since ~may 18th, the issue seems to be with the flight tests. > {code:java} > Failed Apache.Arrow.Flight.Tests.FlightTests.TestGetFlightMetadata [567 ms] > Error Message: > Grpc.Core.RpcException : Status(StatusCode="Internal", Detail="Error > starting gRPC call. HttpRequestException: An HTTP/2 connection could not be > established because the server did not complete the HTTP/2 handshake.", > DebugException="System.Net.Http.HttpRequestException: An HTTP/2 connection > could not be established because the server did not complete the HTTP/2 > handshake. > at > System.Net.Http.HttpConnectionPool.ReturnHttp2Connection(Http2Connection > connection, Boolean isNewConnection) > at > System.Net.Http.HttpConnectionPool.AddHttp2ConnectionAsync(HttpRequestMessage > request) > at > System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.ExecutionContextCallback(Object > s) > at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread > threadPoolThread, ExecutionContext executionContext, ContextCallback > callback, Object state) > at > System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext(Thread > threadPoolThread) > at > System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.ExecuteFromThreadPool(Thread > threadPoolThread) > at System.Threading.ThreadPoolWorkQueue.Dispatch() > at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() > at System.Threading.Thread.StartCallback() > --- End of stack trace from previous location --- > at > System.Threading.Tasks.TaskCompletionSourceWithCancellation`1.WaitWithCancellationAsync(CancellationToken > cancellationToken) > at > System.Net.Http.HttpConnectionPool.GetHttp2ConnectionAsync(HttpRequestMessage > request, Boolean async, CancellationToken cancellationToken) > at > System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage > request, Boolean async, Boolean doRequestAuth, CancellationToken > cancellationToken) > at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, > Boolean async, CancellationToken cancellationToken) > at Grpc.Net.Client.Internal.GrpcCall`2.RunCall(HttpRequestMessage request, > Nullable`1 timeout)") > Stack Trace: > at Grpc.Net.Client.Internal.GrpcCall`2.GetResponseHeadersCoreAsync() > at > Apache.Arrow.Flight.Client.FlightClient.<>c.d.MoveNext() in > /Users/voltrondata/github-actions-runner/_work/crossbow/crossbow/arrow/csharp/src/Apache.Arrow.Flight/Client/FlightClient.cs:line > 71 > --- End of stack trace from previous location --- > at Apache.Arrow.Flight.Tests.FlightTests.TestGetFlightMetadata() in > /Users/voltrondata/github-actions-runner/_work/crossbow/crossbow/arrow/csharp/test/Apache.Arrow.Flight.Tests/FlightTests.cs:line > 183 > --- End of stack trace from previous location --- > Failed Apache.Arrow.Flight.Tests.FlightTests.TestGetSchema [108 ms] > Error Message: > Grpc.Core.RpcException : Status(StatusCode="Internal", Detail="Error > starting gRPC call. HttpRequestException: An HTTP/2 connection could not be > established because the server did not complete the HTTP/2 handshake.", > DebugException="System.Net.Http.HttpRequestException: An HTTP/2 connection > could not be established because the server did not complete the HTTP/2 > handshake. > at > System.Net.Http.HttpConnectionPool.ReturnHttp2Connection(Http2Connection > connection, Boolean isNewConnection) > at > System.Net.Http.HttpConnectionPool.AddHttp2ConnectionAsync(HttpRequestMessage > request) > at > System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& > stateMachine) > at > System.Net.Http.HttpConnectionPool.AddHttp2ConnectionAsync(HttpRequestMessage > request) > at > System.Net.Http.HttpConnectionPool.<>c__DisplayClass78_0.b__0() > at System.Threading.Tasks.Task`1.InnerInvoke() > at System.Threading.Tasks.Task.<>c.<.cctor>b__272_0(Object obj) > at
[jira] [Updated] (ARROW-18161) [Ruby] Tables can have buffers get GC'ed
[ https://issues.apache.org/jira/browse/ARROW-18161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18161: --- Labels: pull-request-available (was: ) > [Ruby] Tables can have buffers get GC'ed > > > Key: ARROW-18161 > URL: https://issues.apache.org/jira/browse/ARROW-18161 > Project: Apache Arrow > Issue Type: Bug > Components: Ruby >Affects Versions: 9.0.0 > Environment: Ruby 3.1.2 >Reporter: Noah Horton >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > ven an Arrow::Table with several columns "X" > > {code:ruby} > # Rails console outputs > 3.1.2 :107 > x.schema > => > # dates: date32[day] > expected_values: double> > 3.1.2 :108 > x.schema > => > # dates: date32[day] > expected_values: double> > 3.1.2 :109 > {code} > Note that the object and pointer have both changed values. > But the far bigger issue is that repeated reads from it will cause different > results: > {code:ruby} > 3.1.2 :097 > x[1][0] > => Sun, 22 Aug 2021 > 3.1.2 :098 > x[1][1] > => nil > 3.1.2 :099 > x[1][0] > => nil {code} > I have a lot of issues like this - when I have done these types of read > operations, I get the original table with the data in the columns all > shuffled around or deleted. > I do ingest the data slightly oddly in the first place as it comes in over > GRPC and I am using Arrow::Buffer to read it from the GRPC and then passing > that into Arrow::Table.load. But I would not expect that once it was in > Arrow::Table that I could do anything to permute it unintentionally. > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-11631) [R] Implement RPrimitiveConverter for Decimal type
[ https://issues.apache.org/jira/browse/ARROW-11631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-11631: --- Labels: pull-request-available (was: ) > [R] Implement RPrimitiveConverter for Decimal type > -- > > Key: ARROW-11631 > URL: https://issues.apache.org/jira/browse/ARROW-11631 > Project: Apache Arrow > Issue Type: Bug > Components: R >Affects Versions: 3.0.0 >Reporter: Ian Cook >Assignee: Dewey Dunnington >Priority: Major > Labels: pull-request-available > Fix For: 11.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > This succeeds: > {code:java} > Array$create(1)$cast(decimal(10, 2)){code} > but this fails: > {code:java} > Array$create(1, type = decimal(10, 2)){code} > with error: > {code:java} > NotImplemented: Extend{code} > because the {{Extend}} method of the {{RPromitiveConverter}} class for the > Decimal type is not yet implemented. > The error is thrown here: > [https://github.com/apache/arrow/blob/7184c3f46981dd52c3c521b2676796e82f17da77/r/src/r_to_arrow.cpp#L601] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18400) [Python] Quadratic memory usage of Table.to_pandas with nested data
[ https://issues.apache.org/jira/browse/ARROW-18400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18400: --- Labels: pull-request-available (was: ) > [Python] Quadratic memory usage of Table.to_pandas with nested data > --- > > Key: ARROW-18400 > URL: https://issues.apache.org/jira/browse/ARROW-18400 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 10.0.1 > Environment: Python 3.10.8 on Fedora Linux 36. AMD Ryzen 9 5900 X > with 64 GB RAM >Reporter: Adam Reeve >Assignee: Alenka Frim >Priority: Critical > Labels: pull-request-available > Fix For: 11.0.0 > > Attachments: test_memory.py > > Time Spent: 10m > Remaining Estimate: 0h > > Reading nested Parquet data and then converting it to a Pandas DataFrame > shows quadratic memory usage and will eventually run out of memory for > reasonably small files. I had initially thought this was a regression since > 7.0.0, but it looks like 7.0.0 has similar quadratic memory usage that kicks > in at higher row counts. > Example code to generate nested Parquet data: > {code:python} > import numpy as np > import random > import string > import pandas as pd > _characters = string.ascii_uppercase + string.digits + string.punctuation > def make_random_string(N=10): > return ''.join(random.choice(_characters) for _ in range(N)) > nrows = 1_024_000 > filename = 'nested.parquet' > arr_len = 10 > nested_col = [] > for i in range(nrows): > nested_col.append(np.array( > [{ > 'a': None if i % 1000 == 0 else np.random.choice(1, > size=3).astype(np.int64), > 'b': None if i % 100 == 0 else random.choice(range(100)), > 'c': None if i % 10 == 0 else make_random_string(5) > } for i in range(arr_len)] > )) > df = pd.DataFrame({'c1': nested_col}) > df.to_parquet(filename) > {code} > And then read into a DataFrame with: > {code:python} > import pyarrow.parquet as pq > table = pq.read_table(filename) > df = table.to_pandas() > {code} > Only reading to an Arrow table isn't a problem, it's the to_pandas method > that exhibits the large memory usage. I haven't tested generating nested > Arrow data in memory without writing Parquet from Pandas but I assume the > problem probably isn't Parquet specific. > Memory usage I see when reading different sized files on a machine with 64 GB > RAM: > ||Num rows||Memory used with 10.0.1 (MB)||Memory used with 7.0.0 (MB)|| > |32,000|362|361| > |64,000|531|531| > |128,000|1,152|1,101| > |256,000|2,888|1,402| > |512,000|10,301|3,508| > |1,024,000|38,697|5,313| > |2,048,000|OOM|20,061| > |4,096,000| |OOM| > With Arrow 10.0.1, memory usage approximately quadruples when row count > doubles above 256k rows. With Arrow 7.0.0 memory usage is more linear but > then quadruples from 1024k to 2048k rows. > PyArrow 8.0.0 shows similar memory usage to 10.0.1 so it looks like something > changed between 7.0.0 and 8.0.0. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18086) [Ruby] Importing table containing float16 array throws error
[ https://issues.apache.org/jira/browse/ARROW-18086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18086: --- Labels: pull-request-available (was: ) > [Ruby] Importing table containing float16 array throws error > > > Key: ARROW-18086 > URL: https://issues.apache.org/jira/browse/ARROW-18086 > Project: Apache Arrow > Issue Type: Bug > Components: Ruby >Affects Versions: 9.0.0 >Reporter: Atte Keinänen >Assignee: Kouhei Sutou >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > In Red Arrow, loading table containing float16 array leads to this error when > using IPC streaming format: > {code:java} > > Arrow::Table.load(Arrow::Buffer.new(resp.body), format: :arrow_streaming) > cannot create instance of abstract (non-instantiatable) type 'GArrowDataType' > from > /usr/local/bundle/gems/gobject-introspection-4.0.3/lib/gobject-introspection/loader.rb:688:in > `invoke' from > /usr/local/bundle/gems/gobject-introspection-4.0.3/lib/gobject-introspection/loader.rb:559:in > `get_field'{code} > At least using float64 list array this does not happen. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-11402) [C++][Dataset] Allow more aggresive implicit casts for literals
[ https://issues.apache.org/jira/browse/ARROW-11402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-11402: --- Labels: dataset pull-request-available (was: dataset) > [C++][Dataset] Allow more aggresive implicit casts for literals > --- > > Key: ARROW-11402 > URL: https://issues.apache.org/jira/browse/ARROW-11402 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Ben Kietzman >Priority: Major > Labels: dataset, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > After ARROW-8919, a literal in an Expression may cause unnecessary implicit > casting of a column. For example {{equal(field_ref("i8"), literal(1))}} will > cause column i8 to be promoted to the type of the literal (int32) for > comparison. Since we have access to the literal value at bind time we could > examine {{1}} and determine that it can safely be *de*moted to int8, which > produces a semantically equivalent and more performant filter. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-17302) [R] Configure curl timeout policy for S3
[ https://issues.apache.org/jira/browse/ARROW-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-17302: --- Labels: good-first-issue pull-request-available (was: good-first-issue) > [R] Configure curl timeout policy for S3 > > > Key: ARROW-17302 > URL: https://issues.apache.org/jira/browse/ARROW-17302 > Project: Apache Arrow > Issue Type: Sub-task > Components: R >Affects Versions: 9.0.0 >Reporter: Dragoș Moldovan-Grünfeld >Priority: Major > Labels: good-first-issue, pull-request-available > Fix For: 11.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > See ARROW-16521 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-15735) [C++] Hash aggregate functions to return first and last value from a group.
[ https://issues.apache.org/jira/browse/ARROW-15735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-15735: --- Labels: kernel pull-request-available (was: kernel) > [C++] Hash aggregate functions to return first and last value from a group. > --- > > Key: ARROW-15735 > URL: https://issues.apache.org/jira/browse/ARROW-15735 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: A. Coady >Assignee: Sanjiban Sengupta >Priority: Major > Labels: kernel, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Follow-up to ARROW-13993, which implemented `hash_one` to select an arbitrary > value, as the core engine lack support for ordering. I think `first` and > `last` will still be in demand though, based on pandas and sql usage. > It could be done without core changes by using `min_max` on an array of > indices. For that reason, maybe it would be better as > `hash_\{first,last}_index`, suitable for use with `take`. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18318) [Python] Expose Scalar.validate
[ https://issues.apache.org/jira/browse/ARROW-18318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18318: --- Labels: good-first-issue pull-request-available (was: good-first-issue) > [Python] Expose Scalar.validate > --- > > Key: ARROW-18318 > URL: https://issues.apache.org/jira/browse/ARROW-18318 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Antoine Pitrou >Priority: Major > Labels: good-first-issue, pull-request-available > Fix For: 11.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > In C++, scalars have {{Validate}} and {{ValidateFull}} methods, just like > arrays. However, these methods were not exposed on PyArrow scalars (while > they are on PyArrow arrays). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-15206) [Ruby] Allow to pass schema when loading table from file
[ https://issues.apache.org/jira/browse/ARROW-15206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-15206: --- Labels: pull-request-available (was: ) > [Ruby] Allow to pass schema when loading table from file > > > Key: ARROW-15206 > URL: https://issues.apache.org/jira/browse/ARROW-15206 > Project: Apache Arrow > Issue Type: Improvement > Components: Ruby >Reporter: Kanstantsin Ilchanka >Assignee: Kouhei Sutou >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > There is ability to do this in C++, but not in Ruby > {code:java} > schema = Arrow::Schema.new(a: :int64, b: :double) > Arrow::Table.load(URI('file:///tmp/example.csv'), format: :csv, schema: > schema){code} > This should also work when loading multiple files from folder -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18202) [R][C++] Different behaviour of R's base::gsub() binding aka libarrow's replace_string_regex kernel since 10.0.0
[ https://issues.apache.org/jira/browse/ARROW-18202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18202: --- Labels: pull-request-available (was: ) > [R][C++] Different behaviour of R's base::gsub() binding aka libarrow's > replace_string_regex kernel since 10.0.0 > > > Key: ARROW-18202 > URL: https://issues.apache.org/jira/browse/ARROW-18202 > Project: Apache Arrow > Issue Type: Bug > Components: C++, R >Affects Versions: 10.0.0 >Reporter: Lorenzo Isella >Assignee: Will Jones >Priority: Critical > Labels: pull-request-available > Fix For: 11.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Hello, > I think there is a problem with arrow 10.0 and R. I did not have this issue > with arrow 9.0. > Could you please have a look? > Many thanks > > {code:r} > library(tidyverse) > library(arrow) > ll <- c( "100", "1000", "200" , "3000" , "50" , > "500", "" , "Not Range") > df <- tibble(x=rep(ll, 1000), y=seq(8000)) > write_tsv(df, "data.tsv") > data <- open_dataset("data.tsv", format="tsv", > skip_rows=1, > schema=schema(x=string(), > y=double()) > ) > test <- data |> > collect() > ###I want to replace the "" with "0". I believe this worked with arrow 9.0 > df2 <- data |> > mutate(x=gsub("^$","0",x) ) |> > collect() > df2 ### now I did not modify the "" entries in x > #> # A tibble: 8,000 × 2 > #> x y > #> > #> 1 "100" 1 > #> 2 "1000" 2 > #> 3 "200" 3 > #> 4 "3000" 4 > #> 5 "50" 5 > #> 6 "500" 6 > #> 7 "" 7 > #> 8 "Not Range" 8 > #> 9 "100" 9 > #> 10 "1000" 10 > #> # … with 7,990 more rows > > df3 <- df |> > mutate(x=gsub("^$","0",x) ) > df3 ## and this is fine > #> # A tibble: 8,000 × 2 > #> x y > #> > #> 1 100 1 > #> 2 1000 2 > #> 3 200 3 > #> 4 3000 4 > #> 5 50 5 > #> 6 500 6 > #> 7 0 7 > #> 8 Not Range 8 > #> 9 100 9 > #> 10 1000 10 > #> # … with 7,990 more rows > ## How to fix this...I believe this issue did not arise with arrow 9.0. > sessionInfo() > #> R version 4.2.1 (2022-06-23) > #> Platform: x86_64-pc-linux-gnu (64-bit) > #> Running under: Debian GNU/Linux 11 (bullseye) > #> > #> Matrix products: default > #> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 > #> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0 > #> > #> locale: > #> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C > #> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 > #> [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 > #> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C > #> [9] LC_ADDRESS=C LC_TELEPHONE=C > #> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C > #> > #> attached base packages: > #> [1] stats graphics grDevices utils datasets methods base > #> > #> other attached packages: > #> [1] arrow_10.0.0 forcats_0.5.2 stringr_1.4.1 dplyr_1.0.10 > #> [5] purrr_0.3.5 readr_2.1.3 tidyr_1.2.1 tibble_3.1.8 > #> [9] ggplot2_3.3.6 tidyverse_1.3.2 > #> > #> loaded via a namespace (and not attached): > #> [1] lubridate_1.8.0 assertthat_0.2.1 digest_0.6.30 > #> [4] utf8_1.2.2 R6_2.5.1 cellranger_1.1.0 > #> [7] backports_1.4.1 reprex_2.0.2 evaluate_0.17 > #> [10] httr_1.4.4 highr_0.9 pillar_1.8.1 > #> [13] rlang_1.0.6 googlesheets4_1.0.1 readxl_1.4.1 > #> [16] R.utils_2.12.1 R.oo_1.25.0 rmarkdown_2.17 > #> [19] styler_1.8.0 googledrive_2.0.0 bit_4.0.4 > #> [22] munsell_0.5.0 broom_1.0.1 compiler_4.2.1 > #> [25] modelr_0.1.9 xfun_0.34 pkgconfig_2.0.3 > #> [28] htmltools_0.5.3 tidyselect_1.2.0 fansi_1.0.3 > #> [31] crayon_1.5.2 tzdb_0.3.0 dbplyr_2.2.1 > #> [34] withr_2.5.0 R.methodsS3_1.8.2 grid_4.2.1 > #> [37] jsonlite_1.8.3 gtable_0.3.1 lifecycle_1.0.3 > #> [40] DBI_1.1.3 magrittr_2.0.3 scales_1.2.1 > #> [43] vroom_1.6.0 cli_3.4.1 stringi_1.7.8 > #> [46] fs_1.5.2 xml2_1.3.3 ellipsis_0.3.2 > #> [49] generics_0.1.3 vctrs_0.5.0 tools_4.2.1 > #> [52] bit64_4.0.5 R.cache_0.16.0 glue_1.6.2 > #> [55] hms_1.1.2
[jira] [Updated] (ARROW-18195) [R][C++] Final value returned by case_when is NA when input has 64 or more values and 1 or more NAs
[ https://issues.apache.org/jira/browse/ARROW-18195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18195: --- Labels: pull-request-available (was: ) > [R][C++] Final value returned by case_when is NA when input has 64 or more > values and 1 or more NAs > --- > > Key: ARROW-18195 > URL: https://issues.apache.org/jira/browse/ARROW-18195 > Project: Apache Arrow > Issue Type: Bug > Components: C++, R >Affects Versions: 10.0.0 >Reporter: Lee Mendelowitz >Assignee: Will Jones >Priority: Critical > Labels: pull-request-available > Fix For: 11.0.0 > > Attachments: test_issue.R > > Time Spent: 10m > Remaining Estimate: 0h > > There appears to be a bug when processing an Arrow table with NA values and > using `dplyr::case_when`. A reproducible example is below: the output from > arrow table processing does not match the output when processing a tibble. If > the NA's are removed from the dataframe, then the outputs match. > {noformat} > ``` r > library(dplyr) > #> > #> Attaching package: 'dplyr' > #> The following objects are masked from 'package:stats': > #> > #> filter, lag > #> The following objects are masked from 'package:base': > #> > #> intersect, setdiff, setequal, union > library(arrow) > #> > #> Attaching package: 'arrow' > #> The following object is masked from 'package:utils': > #> > #> timestamp > library(assertthat) > play_results = c('single', 'double', 'triple', 'home_run') > nrows = 1000 > # Change frac_na to 0, and the result error disappears. > frac_na = 0.05 > # Create a test dataframe with NA values > test_df = tibble( > play_result = sample(play_results, nrows, replace = TRUE) > ) %>% > mutate( > play_result = ifelse(runif(nrows) < frac_na, NA_character_, > play_result) > ) > > test_arrow = arrow_table(test_df) > process_plays = function(df) { > df %>% > mutate( > avg = case_when( > play_result == 'single' ~ 1, > play_result == 'double' ~ 1, > play_result == 'triple' ~ 1, > play_result == 'home_run' ~ 1, > is.na(play_result) ~ NA_real_, > TRUE ~ 0 > ) > ) %>% > count(play_result, avg) %>% > arrange(play_result) > } > # Compare arrow_table reuslt to tibble result > result_tibble = process_plays(test_df) > result_arrow = process_plays(test_arrow) %>% collect() > assertthat::assert_that(identical(result_tibble, result_arrow)) > #> Error: result_tibble not identical to result_arrow > ``` > Created on 2022-10-29 with [reprex > v2.0.2](https://reprex.tidyverse.org) > {noformat} > I have reproduced this issue both on Mac OS and Ubuntu 20.04. > > {noformat} > ``` > r$> sessionInfo() > R version 4.2.1 (2022-06-23) > Platform: aarch64-apple-darwin21.5.0 (64-bit) > Running under: macOS Monterey 12.5.1 > Matrix products: default > BLAS: /opt/homebrew/Cellar/openblas/0.3.20/lib/libopenblasp-r0.3.20.dylib > LAPACK: /opt/homebrew/Cellar/r/4.2.1/lib/R/lib/libRlapack.dylib > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > attached base packages: > [1] stats graphics grDevices datasets utils methods base > other attached packages: > [1] assertthat_0.2.1 arrow_10.0.0 dplyr_1.0.10 > loaded via a namespace (and not attached): > [1] compiler_4.2.1 pillar_1.8.1 highr_0.9 R.methodsS3_1.8.2 > R.utils_2.12.0 tools_4.2.1 bit_4.0.4 digest_0.6.29 > [9] evaluate_0.15 lifecycle_1.0.1 tibble_3.1.8 R.cache_0.16.0 > pkgconfig_2.0.3 rlang_1.0.5 reprex_2.0.2 DBI_1.1.2 > [17] cli_3.3.0 rstudioapi_0.13 yaml_2.3.5 xfun_0.31 > fastmap_1.1.0 withr_2.5.0 styler_1.8.0 knitr_1.39 > [25] generics_0.1.3 fs_1.5.2 vctrs_0.4.1 bit64_4.0.5 > tidyselect_1.1.2 glue_1.6.2 R6_2.5.1 processx_3.5.3 > [33] fansi_1.0.3 rmarkdown_2.14 purrr_0.3.4 callr_3.7.0 > clipr_0.8.0 magrittr_2.0.3 ellipsis_0.3.2 ps_1.7.0 > [41] htmltools_0.5.3 renv_0.16.0 utf8_1.2.2 R.oo_1.25.0 > ``` > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-12264) [C++][Dataset] Handle NaNs correctly in Parquet predicate push-down
[ https://issues.apache.org/jira/browse/ARROW-12264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-12264: --- Labels: pull-request-available (was: ) > [C++][Dataset] Handle NaNs correctly in Parquet predicate push-down > --- > > Key: ARROW-12264 > URL: https://issues.apache.org/jira/browse/ARROW-12264 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Parquet >Reporter: Antoine Pitrou >Assignee: Sanjiban Sengupta >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The Parquet spec (in parquet.thrift) says the following about handling of > floating-point statistics: > {code} >* (*) Because the sorting order is not specified properly for floating >* point values (relations vs. total ordering) the following >* compatibility rules should be applied when reading statistics: >* - If the min is a NaN, it should be ignored. >* - If the max is a NaN, it should be ignored. >* - If the min is +0, the row group may contain -0 values as well. >* - If the max is -0, the row group may contain +0 values as well. >* - When looking for NaN values, min and max should be ignored. > {code} > It appears that the dataset code uses the following filter expression when > doing Parquet predicate push-down (in {{file_parquet.cc}}): > {code:c++} > return and_(greater_equal(field_expr, literal(min)), > less_equal(field_expr, literal(max))); > {code} > A NaN value will fail that filter and yet may be found in the given Parquet > column chunk. > We may instead need a "greater_equal_or_nan" comparison that returns true if > either value is NaN. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-16212) [C++][Python] Register Multiple Kernels for a UDF
[ https://issues.apache.org/jira/browse/ARROW-16212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-16212: --- Labels: pull-request-available (was: ) > [C++][Python] Register Multiple Kernels for a UDF > - > > Key: ARROW-16212 > URL: https://issues.apache.org/jira/browse/ARROW-16212 > Project: Apache Arrow > Issue Type: Sub-task > Components: C++, Python >Reporter: Vibhatha Lakmal Abeykoon >Assignee: Vibhatha Lakmal Abeykoon >Priority: Major > Labels: pull-request-available > Fix For: 11.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > The current UDF integration > (https://issues.apache.org/jira/browse/ARROW-15639) doesn't support multiple > kernel registration. It only supports registering a single kernel under a > given function name. Enabling multiple kernels to be registered under the > same function name is a must for usability and consistency with the existing > functions in the function registry. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18403) [C++] Error consuming Substrait plan which uses count function: "only unary aggregate functions are currently supported"
[ https://issues.apache.org/jira/browse/ARROW-18403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18403: --- Labels: pull-request-available substrait (was: substrait) > [C++] Error consuming Substrait plan which uses count function: "only unary > aggregate functions are currently supported" > > > Key: ARROW-18403 > URL: https://issues.apache.org/jira/browse/ARROW-18403 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Nicola Crane >Priority: Major > Labels: pull-request-available, substrait > Time Spent: 10m > Remaining Estimate: 0h > > ARROW-17523 added support for the Substrait extension function "count", but > when I write code which produces a Substrait plan which calls it, and then > try to run it in Acero, I get an error. > The plan: > {code:r} > message of type 'substrait.Plan' with 3 fields set > extension_uris { > extension_uri_anchor: 1 > uri: > "https://github.com/substrait-io/substrait/blob/main/extensions/functions_arithmetic.yaml; > } > extension_uris { > extension_uri_anchor: 2 > uri: > "https://github.com/substrait-io/substrait/blob/main/extensions/functions_comparison.yaml; > } > extension_uris { > extension_uri_anchor: 3 > uri: > "https://github.com/substrait-io/substrait/blob/main/extensions/functions_aggregate_generic.yaml; > } > extensions { > extension_function { > extension_uri_reference: 3 > function_anchor: 2 > name: "count" > } > } > relations { > rel { > aggregate { > input { > project { > common { > emit { > output_mapping: 9 > output_mapping: 10 > output_mapping: 11 > output_mapping: 12 > output_mapping: 13 > output_mapping: 14 > output_mapping: 15 > output_mapping: 16 > output_mapping: 17 > } > } > input { > read { > base_schema { > names: "int" > names: "dbl" > names: "dbl2" > names: "lgl" > names: "false" > names: "chr" > names: "verses" > names: "padded_strings" > names: "some_negative" > struct_ { > types { > i32 { > nullability: NULLABILITY_NULLABLE > } > } > types { > fp64 { > nullability: NULLABILITY_NULLABLE > } > } > types { > fp64 { > nullability: NULLABILITY_NULLABLE > } > } > types { > bool_ { > nullability: NULLABILITY_NULLABLE > } > } > types { > bool_ { > nullability: NULLABILITY_NULLABLE > } > } > types { > string { > nullability: NULLABILITY_NULLABLE > } > } > types { > string { > nullability: NULLABILITY_NULLABLE > } > } > types { > string { > nullability: NULLABILITY_NULLABLE > } > } > types { > fp64 { > nullability: NULLABILITY_NULLABLE > } > } > } > } > local_files { > items { > uri_file: "file:///tmp/RtmpsBsoZJ/file1915f604cff4a" > parquet { > } > } > } > } > } > expressions { > selection { > direct_reference { > struct_field { > } > } > root_reference { > } > } > } > expressions { > selection { > direct_reference { > struct_field { > field: 1 > } > } > root_reference { > } > } > } > expressions { > selection { > direct_reference { >
[jira] [Updated] (ARROW-18394) [CI][Python] Nightly pyhon pandas jobs using latest or upstream_devel fail
[ https://issues.apache.org/jira/browse/ARROW-18394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18394: --- Labels: Nightly pull-request-available (was: Nightly) > [CI][Python] Nightly pyhon pandas jobs using latest or upstream_devel fail > -- > > Key: ARROW-18394 > URL: https://issues.apache.org/jira/browse/ARROW-18394 > Project: Apache Arrow > Issue Type: Bug > Components: Continuous Integration, Python >Reporter: Raúl Cumplido >Assignee: Joris Van den Bossche >Priority: Critical > Labels: Nightly, pull-request-available > Fix For: 11.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Currently the following jobs fail: > |test-conda-python-3.8-pandas-nightly|https://github.com/ursacomputing/crossbow/actions/runs/3532562061/jobs/5927065343| > |test-conda-python-3.9-pandas-upstream_devel|https://github.com/ursacomputing/crossbow/actions/runs/3532562477/jobs/5927066168| > with: > {code:java} > _ test_roundtrip_with_bytes_unicode[columns0] > __columns = [b'foo'] @pytest.mark.parametrize('columns', > ([b'foo'], ['foo'])) > def test_roundtrip_with_bytes_unicode(columns): > df = pd.DataFrame(columns=columns) > table1 = pa.Table.from_pandas(df) > > table2 = > > pa.Table.from_pandas(table1.to_pandas())opt/conda/envs/arrow/lib/python3.8/site-packages/pyarrow/tests/test_pandas.py:2867: > > > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > pyarrow/array.pxi:830: in pyarrow.lib._PandasConvertible.to_pandas > ??? > pyarrow/table.pxi:3908: in pyarrow.lib.Table._to_pandas > ??? > opt/conda/envs/arrow/lib/python3.8/site-packages/pyarrow/pandas_compat.py:819: > in table_to_blockmanager > columns = _deserialize_column_index(table, all_columns, column_indexes) > opt/conda/envs/arrow/lib/python3.8/site-packages/pyarrow/pandas_compat.py:935: > in _deserialize_column_index > columns = _reconstruct_columns_from_metadata(columns, column_indexes) > opt/conda/envs/arrow/lib/python3.8/site-packages/pyarrow/pandas_compat.py:1154: > in _reconstruct_columns_from_metadata > level = level.astype(dtype) > opt/conda/envs/arrow/lib/python3.8/site-packages/pandas/core/indexes/base.py:1029: > in astype > return Index(new_values, name=self.name, dtype=new_values.dtype, > copy=False) > opt/conda/envs/arrow/lib/python3.8/site-packages/pandas/core/indexes/base.py:518: > in __new__ > klass = cls._dtype_to_subclass(arr.dtype) > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ cls = , dtype = dtype('S3') > @final > @classmethod > def _dtype_to_subclass(cls, dtype: DtypeObj): > # Delay import for perf. > https://github.com/pandas-dev/pandas/pull/31423 > > if isinstance(dtype, ExtensionDtype): > if isinstance(dtype, DatetimeTZDtype): > from pandas import DatetimeIndex > > return DatetimeIndex > elif isinstance(dtype, CategoricalDtype): > from pandas import CategoricalIndex > > return CategoricalIndex > elif isinstance(dtype, IntervalDtype): > from pandas import IntervalIndex > > return IntervalIndex > elif isinstance(dtype, PeriodDtype): > from pandas import PeriodIndex > > return PeriodIndex > > return Index > > if dtype.kind == "M": > from pandas import DatetimeIndex > > return DatetimeIndex > > elif dtype.kind == "m": > from pandas import TimedeltaIndex > > return TimedeltaIndex > > elif dtype.kind == "f": > from pandas.core.api import Float64Index > > return Float64Index > elif dtype.kind == "u": > from pandas.core.api import UInt64Index > > return UInt64Index > elif dtype.kind == "i": > from pandas.core.api import Int64Index > > return Int64Index > > elif dtype.kind == "O": > # NB: assuming away MultiIndex > return Index > > elif issubclass( > dtype.type, (str, bool, np.bool_, complex, np.complex64, > np.complex128) > ): > return Index > > > raise NotImplementedError(dtype) > E NotImplementedError: > |S3opt/conda/envs/arrow/lib/python3.8/site-packages/pandas/core/indexes/base.py:595: > NotImplementedError{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-17538) [C++] Importing an ArrowArrayStream can't handle errors from get_schema
[ https://issues.apache.org/jira/browse/ARROW-17538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-17538: --- Labels: good-first-issue pull-request-available (was: good-first-issue) > [C++] Importing an ArrowArrayStream can't handle errors from get_schema > --- > > Key: ARROW-17538 > URL: https://issues.apache.org/jira/browse/ARROW-17538 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 9.0.0 >Reporter: David Li >Assignee: David Li >Priority: Major > Labels: good-first-issue, pull-request-available > Fix For: 11.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > As indicated in the code: > https://github.com/apache/arrow/blob/cd3c6ead97d584366aafd2f14d99a1cb8ace9ca2/cpp/src/arrow/c/bridge.cc#L1823 > > This probably needs a static initializer so we can catch things. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18353) [C++][Flight] Sporadic hang in UCX tests
[ https://issues.apache.org/jira/browse/ARROW-18353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18353: --- Labels: pull-request-available (was: ) > [C++][Flight] Sporadic hang in UCX tests > > > Key: ARROW-18353 > URL: https://issues.apache.org/jira/browse/ARROW-18353 > Project: Apache Arrow > Issue Type: Bug > Components: C++, FlightRPC >Reporter: Antoine Pitrou >Assignee: David Li >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The UCX tests sometimes hang here. > Full gdb backtraces for all threads: > {code} > Thread 8 (Thread 0x7f4562fcd700 (LWP 76837)): > #0 0x7f4577b72ad3 in futex_wait_cancelable (private=, > expected=0, futex_word=0x564ebe5b5b3c) > at ../sysdeps/unix/sysv/linux/futex-internal.h:88 > #1 __pthread_cond_wait_common (abstime=0x0, mutex=0x564ebe5b5ae0, > cond=0x564ebe5b5b10) at pthread_cond_wait.c:502 > #2 __pthread_cond_wait (cond=0x564ebe5b5b10, mutex=0x564ebe5b5ae0) at > pthread_cond_wait.c:655 > #3 0x7f457b4ce7cb in > std::condition_variable::wait namespace)::WriteClientStream::WritesDone():: > >(std::unique_lock &, struct {...}) (this=0x564ebe5b5b10, > __lock=..., __p=...) > at > /opt/conda/envs/arrow/x86_64-conda-linux-gnu/include/c++/10.4.0/condition_variable:111 > #4 0x7f457b4c7b5e in arrow::flight::transport::ucx::(anonymous > namespace)::WriteClientStream::WritesDone (this=0x564ebe5b5a90) > at /arrow/cpp/src/arrow/flight/transport/ucx/ucx_client.cc:277 > #5 0x7f457b4cc989 in arrow::flight::transport::ucx::(anonymous > namespace)::UcxClientStream::DoFinish (this=0x564ebe5b5a90) > at /arrow/cpp/src/arrow/flight/transport/ucx/ucx_client.cc:692 > #6 0x7f457af80e04 in arrow::flight::internal::ClientDataStream::Finish > (this=0x564ebe5b5a90, st=...) at /arrow/cpp/src/arrow/flight/transport.cc:46 > #7 0x7f457af4f6e1 in arrow::flight::ClientMetadataReader::ReadMetadata > (this=0x564ebe560630, out=0x7f4562fcc170) > at /arrow/cpp/src/arrow/flight/client.cc:263 > #8 0x7f457b593af6 in operator() (__closure=0x564ebe4e4848) at > /arrow/cpp/src/arrow/flight/test_definitions.cc:1538 > #9 0x7f457b5b66b8 in std::__invoke_impl arrow::flight::ErrorHandlingTest::TestDoPut():: > >(std::__invoke_other, struct {...} &&) (__f=...) > at > /opt/conda/envs/arrow/x86_64-conda-linux-gnu/include/c++/10.4.0/bits/invoke.h:60 > #10 0x7f457b5b6529 in > std::__invoke > >(struct {...} &&) (__fn=...) > at > /opt/conda/envs/arrow/x86_64-conda-linux-gnu/include/c++/10.4.0/bits/invoke.h:95 > #11 0x7f457b5b63c4 in > std::thread::_Invoker > > >::_M_invoke<0>(std::_Index_tuple<0>) ( > this=0x564ebe4e4848) at > /opt/conda/envs/arrow/x86_64-conda-linux-gnu/include/c++/10.4.0/thread:264 > #12 0x7f457b5b6224 in > std::thread::_Invoker > > >::operator()(void) ( > this=0x564ebe4e4848) at > /opt/conda/envs/arrow/x86_64-conda-linux-gnu/include/c++/10.4.0/thread:271 > #13 0x7f457b5b5e1e in > std::thread::_State_impl > > > >::_M_run(void) (this=0x564ebe4e4840) at > /opt/conda/envs/arrow/x86_64-conda-linux-gnu/include/c++/10.4.0/thread:215 > #14 0x7f4578242a93 in std::execute_native_thread_routine (__p= out>) > at > /home/conda/feedstock_root/build_artifacts/gcc_compilers_1666516830325/work/build/x86_64-conda-linux-gnu/libstdc++-v3/include/new_allocator.h:82 > #15 0x7f4577b6c6db in start_thread (arg=0x7f4562fcd700) at > pthread_create.c:463 > #16 0x7f4577ea561f in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 > Thread 7 (Thread 0x7f45725ca700 (LWP 76828)): > #0 0x7f4577ea5947 in epoll_wait (epfd=36, > events=events@entry=0x7f45725c86c0, maxevents=16, timeout=timeout@entry=0) > at ../sysdeps/unix/sysv/linux/epoll_wait.c:30 > #1 0x7f45779fe3e3 in ucs_event_set_wait (event_set=0x7f4564026240, > num_events=num_events@entry=0x7f45725c8804, timeout_ms=timeout_ms@entry=0, > event_set_handler=event_set_handler@entry=0x7f4575d29320 > , arg=arg@entry=0x7f45725c8800) at > sys/event_set.c:198 > #2 0x7f4575d29283 in uct_tcp_iface_progress (tl_iface=0x7f4564026900) at > tcp/tcp_iface.c:327 > #3 0x7f4577a7de22 in ucs_callbackq_dispatch (cbq=) at > /usr/local/src/conda/ucx-1.13.1/src/ucs/datastruct/callbackq.h:211 > #4 uct_worker_progress (worker=) at > /usr/local/src/conda/ucx-1.13.1/src/uct/api/uct.h:2638 > #5 ucp_worker_progress (worker=0x7f4564000c80) at core/ucp_worker.c:2782 > #6 0x7f457b4f186f in > arrow::flight::transport::ucx::UcpCallDriver::Impl::MakeProgress > (this=0x7f456404d3b0) > at /arrow/cpp/src/arrow/flight/transport/ucx/ucx_internal.cc:759 > #7 0x7f457b4eee40 in >
[jira] [Updated] (ARROW-18351) [C++][Flight] Crash in UcxErrorHandlingTest.TestDoExchange
[ https://issues.apache.org/jira/browse/ARROW-18351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18351: --- Labels: pull-request-available (was: ) > [C++][Flight] Crash in UcxErrorHandlingTest.TestDoExchange > -- > > Key: ARROW-18351 > URL: https://issues.apache.org/jira/browse/ARROW-18351 > Project: Apache Arrow > Issue Type: Bug > Components: C++, FlightRPC >Reporter: Antoine Pitrou >Assignee: David Li >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > I get a non-deterministic crash in the Flight UCX tests. > {code} > [--] 3 tests from UcxErrorHandlingTest > [ RUN ] UcxErrorHandlingTest.TestGetFlightInfo > [ OK ] UcxErrorHandlingTest.TestGetFlightInfo (24 ms) > [ RUN ] UcxErrorHandlingTest.TestDoPut > [ OK ] UcxErrorHandlingTest.TestDoPut (15 ms) > [ RUN ] UcxErrorHandlingTest.TestDoExchange > /arrow/cpp/src/arrow/util/future.cc:125: Check failed: > !IsFutureFinished(state_) Future already marked finished > {code} > Here is the GDB backtrace: > {code} > #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 > #1 0x7f18c49cd7f1 in __GI_abort () at abort.c:79 > #2 0x7f18c5854e00 in arrow::util::CerrLog::~CerrLog > (this=0x7f18a81607b0, __in_chrg=) at > /arrow/cpp/src/arrow/util/logging.cc:72 > #3 0x7f18c5854e1c in arrow::util::CerrLog::~CerrLog > (this=0x7f18a81607b0, __in_chrg=) at > /arrow/cpp/src/arrow/util/logging.cc:74 > #4 0x7f18c5855181 in arrow::util::ArrowLog::~ArrowLog > (this=0x7f18c07fc380, __in_chrg=) at > /arrow/cpp/src/arrow/util/logging.cc:250 > #5 0x7f18c5826f86 in arrow::ConcreteFutureImpl::DoMarkFinishedOrFailed > (this=0x7f18a815f030, state=arrow::FutureState::FAILURE) > at /arrow/cpp/src/arrow/util/future.cc:125 > #6 0x7f18c58265af in arrow::ConcreteFutureImpl::DoMarkFailed > (this=0x7f18a815f030) at /arrow/cpp/src/arrow/util/future.cc:40 > #7 0x7f18c5827660 in arrow::FutureImpl::MarkFailed (this=0x7f18a815f030) > at /arrow/cpp/src/arrow/util/future.cc:195 > #8 0x7f18c80ff8d8 in > arrow::Future > >::DoMarkFinished (this=0x7f18a815efb0, res=...) > at /arrow/cpp/src/arrow/util/future.h:660 > #9 0x7f18c80fb37d in > arrow::Future > >::MarkFinished (this=0x7f18a815efb0, res=...) > at /arrow/cpp/src/arrow/util/future.h:403 > #10 0x7f18c80f5ae3 in > arrow::flight::transport::ucx::UcpCallDriver::Impl::Push > (this=0x7f18a804d2d0, status=...) > at /arrow/cpp/src/arrow/flight/transport/ucx/ucx_internal.cc:780 > #11 0x7f18c80f5c1f in > arrow::flight::transport::ucx::UcpCallDriver::Impl::RecvActiveMessage > (this=0x7f18a804d2d0, header=0x7f18c8081865, header_length=12, > data=0x7f18c8081864, data_length=1, param=0x7f18c07fc680) at > /arrow/cpp/src/arrow/flight/transport/ucx/ucx_internal.cc:791 > #12 0x7f18c80f7d29 in > arrow::flight::transport::ucx::UcpCallDriver::RecvActiveMessage > (this=0x7f18b80017e0, header=0x7f18c8081865, header_length=12, > data=0x7f18c8081864, data_length=1, param=0x7f18c07fc680) at > /arrow/cpp/src/arrow/flight/transport/ucx/ucx_internal.cc:1082 > #13 0x7f18c80e3ea4 in arrow::flight::transport::ucx::(anonymous > namespace)::UcxServerImpl::HandleIncomingActiveMessage (self=0x7f18a80259a0, > header=0x7f18c8081865, header_length=12, data=0x7f18c8081864, > data_length=1, param=0x7f18c07fc680) > at /arrow/cpp/src/arrow/flight/transport/ucx/ucx_server.cc:586 > #14 0x7f18c4661a09 in ucp_am_invoke_cb (recv_flags=, > reply_ep=, data_length=1, data=, > user_hdr_length=, user_hdr=0x7f18c8081865, am_id=4132, > worker=) at core/ucp_am.c:1220 > #15 ucp_am_handler_common (name=, recv_flags= out>, am_flags=0, reply_ep=, total_length=, > am_hdr=0x7f18c808185c, worker=) at core/ucp_am.c:1289 > #16 ucp_am_handler_reply (am_arg=, am_data=, > am_length=, am_flags=) at core/ucp_am.c:1327 > #17 0x7f18c28e3f1c in uct_iface_invoke_am (flags=0, length=29, > data=0x7f18c808185c, id=, iface=0x7f18a8027e20) > at /usr/local/src/conda/ucx-1.13.1/src/uct/base/uct_iface.h:861 > #18 uct_mm_iface_invoke_am (flags=0, length=29, data=0x7f18c808185c, > am_id=, iface=0x7f18a8027e20) at sm/mm/base/mm_iface.h:256 > #19 uct_mm_iface_process_recv (iface=0x7f18a8027e20) at > sm/mm/base/mm_iface.c:256 > #20 uct_mm_iface_poll_fifo (iface=0x7f18a8027e20) at sm/mm/base/mm_iface.c:304 > #21 uct_mm_iface_progress (tl_iface=0x7f18a8027e20) at > sm/mm/base/mm_iface.c:357 > #22 0x7f18c4686e22 in ucs_callbackq_dispatch (cbq=) at > /usr/local/src/conda/ucx-1.13.1/src/ucs/datastruct/callbackq.h:211 > #23 uct_worker_progress (worker=) at > /usr/local/src/conda/ucx-1.13.1/src/uct/api/uct.h:2638 > #24
[jira] [Updated] (ARROW-18231) [C++] Cannot override optimization level using CXXFLAGS
[ https://issues.apache.org/jira/browse/ARROW-18231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18231: --- Labels: pull-request-available (was: ) > [C++] Cannot override optimization level using CXXFLAGS > --- > > Key: ARROW-18231 > URL: https://issues.apache.org/jira/browse/ARROW-18231 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Antoine Pitrou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > In release mode, Arrow C++ unconditionally adds {{-O2}} _at the end_ of the > compiler flags. > So, if you do something like: > {code:bash} > export CXXFLAGS=-O0 > cmake ... > {code} > the final compilation flags will look like {{-O0 -O2}}, meaning the > user-provided optimization level is ignored. > One can instead use the {{ARROW_CXXFLAGS}} CMake variable, but it only > overrides the flags for Arrow itself, not the bundled dependencies. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18438) [Go] firstTimeBitmapWriter.Finish() panics with 8n structs
[ https://issues.apache.org/jira/browse/ARROW-18438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18438: --- Labels: pull-request-available (was: ) > [Go] firstTimeBitmapWriter.Finish() panics with 8n structs > -- > > Key: ARROW-18438 > URL: https://issues.apache.org/jira/browse/ARROW-18438 > Project: Apache Arrow > Issue Type: Bug > Components: Go, Parquet >Affects Versions: 10.0.1 >Reporter: Min-Young Wu >Priority: Critical > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Even after [ARROW-17169|https://issues.apache.org/jira/browse/ARROW-17169] I > still get a panic at the same location. > Below is a test case that panics: > {code:go} > func (ps *ParquetIOTestSuite) TestStructWithListOf8Structs() { > bldr := array.NewStructBuilder(memory.DefaultAllocator, arrow.StructOf( > arrow.Field{ > Name: "l", > Type: arrow.ListOf(arrow.StructOf( > arrow.Field{Name: "a", Type: > arrow.BinaryTypes.String}, > )), > }, > )) > defer bldr.Release() > lBldr := bldr.FieldBuilder(0).(*array.ListBuilder) > stBldr := lBldr.ValueBuilder().(*array.StructBuilder) > aBldr := stBldr.FieldBuilder(0).(*array.StringBuilder) > bldr.AppendNull() > bldr.Append(true) > lBldr.Append(true) > for i := 0; i < 8; i++ { > stBldr.Append(true) > aBldr.Append(strconv.Itoa(i)) > } > arr := bldr.NewArray() > defer arr.Release() > field := arrow.Field{Name: "x", Type: arr.DataType(), Nullable: true} > expected := array.NewTable(arrow.NewSchema([]arrow.Field{field}, nil), > []arrow.Column{*arrow.NewColumn(field, > arrow.NewChunked(field.Type, []arrow.Array{arr}))}, -1) > defer expected.Release() > ps.roundTripTable(expected, false) > } > {code} > I've tried to trim down the input data and this is as minimal as I could get > it. And yes: > * wrapping struct with initial null is required > * the inner list needs to contain 8 structs (or any multiple of 8) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18436) [Python] `FileSystem.from_uri` doesn't decode %-encoded characters in path
[ https://issues.apache.org/jira/browse/ARROW-18436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18436: --- Labels: pull-request-available (was: ) > [Python] `FileSystem.from_uri` doesn't decode %-encoded characters in path > -- > > Key: ARROW-18436 > URL: https://issues.apache.org/jira/browse/ARROW-18436 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 10.0.1 > Environment: - OS: macOS > - `python=3.9.15:h709bd14_0_cpython` (installed from conda-forge) > - `pyarrow=10.0.1:py39h2db5b05_1_cpu` (installed from conda-forge) >Reporter: James Bourbeau >Assignee: Antoine Pitrou >Priority: Minor > Labels: pull-request-available > Fix For: 11.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > When attempting to create a new filesystem object from a public dataset in > S3, where there is a space in the bucket name, an error is raised. > > Here's a minimal reproducer: > {code:java} > from pyarrow.fs import FileSystem > result = FileSystem.from_uri("s3://nyc-tlc/trip > data/fhvhv_tripdata_2022-06.parquet") {code} > which fails with the following traceback: > > {code:java} > Traceback (most recent call last): > File "/Users/james/projects/dask/dask/test.py", line 3, in > result = FileSystem.from_uri("s3://nyc-tlc/trip > data/fhvhv_tripdata_2022-06.parquet") > File "pyarrow/_fs.pyx", line 470, in pyarrow._fs.FileSystem.from_uri > File "pyarrow/error.pxi", line 144, in > pyarrow.lib.pyarrow_internal_check_status > File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status > pyarrow.lib.ArrowInvalid: Cannot parse URI: 's3://nyc-tlc/trip > data/fhvhv_tripdata_2022-06.parquet'{code} > > Note that things work if I use a different dataset that doesn't have a space > in the URI, or if I replace the portion of the URI that has a space with a > `*` wildcard > > {code:java} > from pyarrow.fs import FileSystem > result = FileSystem.from_uri("s3://ursa-labs-taxi-data/2009/01/data.parquet") > # works > result = > FileSystem.from_uri("s3://nyc-tlc/*/fhvhv_tripdata_2022-06.parquet") # works > {code} > > The wildcard isn't necessarily equivalent to the original failing URI, but I > think highlights that the space is somehow problematic. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-14832) [R] Implement bindings for stringr::str_remove and stringr::str_remove_all
[ https://issues.apache.org/jira/browse/ARROW-14832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-14832: --- Labels: pull-request-available (was: ) > [R] Implement bindings for stringr::str_remove and stringr::str_remove_all > -- > > Key: ARROW-14832 > URL: https://issues.apache.org/jira/browse/ARROW-14832 > Project: Apache Arrow > Issue Type: Sub-task > Components: R >Reporter: Dragoș Moldovan-Grünfeld >Assignee: Nicola Crane >Priority: Minor > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > https://stringr.tidyverse.org/reference/str_remove.html explains that it is > an alias for str_replace with "". -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18434) [C++][Parquet] Parquet page index read support
[ https://issues.apache.org/jira/browse/ARROW-18434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18434: --- Labels: pull-request-available (was: ) > [C++][Parquet] Parquet page index read support > -- > > Key: ARROW-18434 > URL: https://issues.apache.org/jira/browse/ARROW-18434 > Project: Apache Arrow > Issue Type: Sub-task >Reporter: Gang Wu >Assignee: Gang Wu >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Implement read support for parquet page index and expose it from the reader > API. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18437) [C++] Parquet DELTA_BINARY_PACKED Page didn't clear the context
[ https://issues.apache.org/jira/browse/ARROW-18437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18437: --- Labels: pull-request-available (was: ) > [C++] Parquet DELTA_BINARY_PACKED Page didn't clear the context > --- > > Key: ARROW-18437 > URL: https://issues.apache.org/jira/browse/ARROW-18437 > Project: Apache Arrow > Issue Type: Bug > Components: Parquet >Affects Versions: 11.0.0 >Reporter: Xuwei Fu >Assignee: Xuwei Fu >Priority: Major > Labels: pull-request-available > Fix For: 11.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > When calling {{{}flushValues{}}}, it didn't: > * clearing the {{total_value_count_}} > * Re-advancing buffer for {{kMaxPageHeaderWriterSize}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18425) Add support for Substrait round expression
[ https://issues.apache.org/jira/browse/ARROW-18425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18425: --- Labels: pull-request-available (was: ) > Add support for Substrait round expression > -- > > Key: ARROW-18425 > URL: https://issues.apache.org/jira/browse/ARROW-18425 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Bryce Mecum >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Work has been started on adding round to Substrait in > [https://github.com/substrait-io/substrait/pull/322] and it looks like a > mapping needs to be registered on the Acero side for Acero to consume plans > with it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18099) [Python] Cannot create pandas categorical from table only with nulls
[ https://issues.apache.org/jira/browse/ARROW-18099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18099: --- Labels: pull-request-available python-conversion (was: python-conversion) > [Python] Cannot create pandas categorical from table only with nulls > > > Key: ARROW-18099 > URL: https://issues.apache.org/jira/browse/ARROW-18099 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 9.0.0 > Environment: OSX 12.6 > M1 silicon >Reporter: Damian Barabonkov >Priority: Minor > Labels: pull-request-available, python-conversion > Time Spent: 10m > Remaining Estimate: 0h > > A pyarrow Table with only null values cannot be instantiated as a Pandas > DataFrame with said column as a category. However, pandas does support > "empty" categoricals. Therefore, a simple patch would be to load the pa.Table > as an object first and convert, once in pandas, to a categorical which will > be empty. However, that does not solve the pyarrow bug at its root. > > Sample reproducible example > {code:java} > import pyarrow as pa > pylist = [{'x': None, '__index_level_0__': 2}, {'x': None, > '__index_level_0__': 3}] > tbl = pa.Table.from_pylist(pylist) > > # Errors > df_broken = tbl.to_pandas(categories=["x"]) > > # Works > df_works = tbl.to_pandas() > df_works = df_works.astype({"x": "category"}) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18435) [C++][Java] Update ORC to 1.8.1
[ https://issues.apache.org/jira/browse/ARROW-18435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18435: --- Labels: pull-request-available (was: ) > [C++][Java] Update ORC to 1.8.1 > --- > > Key: ARROW-18435 > URL: https://issues.apache.org/jira/browse/ARROW-18435 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Java >Reporter: Gang Wu >Assignee: Gang Wu >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-17637) [R] as.Date fails going from timestamp[us] to timestamp[s]
[ https://issues.apache.org/jira/browse/ARROW-17637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-17637: --- Labels: pull-request-available (was: ) > [R] as.Date fails going from timestamp[us] to timestamp[s] > -- > > Key: ARROW-17637 > URL: https://issues.apache.org/jira/browse/ARROW-17637 > Project: Apache Arrow > Issue Type: Bug > Components: R >Reporter: Nicola Crane >Priority: Major > Labels: pull-request-available > Fix For: 11.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Using as.Date to convert from timestamp to date fails in Arrow even though > this is fine in R. > {code:r} > library(arrow) > library(dplyr) > library(lubridate) > tf <- tempfile() > dir.create(tf) > tbl <- tibble::tibble(x = as_datetime('2022-05-05T00:00:01.676632')) > write_dataset(tbl, tf) > open_dataset(tf) %>% > mutate(date = as.Date(x)) %>% > collect() > #> Error in `collect()`: > #> ! Invalid: Casting from timestamp[us, tz=UTC] to timestamp[s, tz=UTC] > would lose data: 1651708801676632 > #> /home/nic2/arrow/cpp/src/arrow/compute/exec.cc:799 > kernel_->exec(kernel_ctx_, input, out) > #> /home/nic2/arrow/cpp/src/arrow/compute/exec.cc:767 > ExecuteSingleSpan(input, ) > #> /home/nic2/arrow/cpp/src/arrow/compute/exec/expression.cc:597 > executor->Execute( ExecBatch(std::move(arguments), all_scalar ? 1 : > input.length), ) > #> /home/nic2/arrow/cpp/src/arrow/compute/exec/expression.cc:579 > ExecuteScalarExpression(call->arguments[i], input, exec_context) > #> /home/nic2/arrow/cpp/src/arrow/compute/exec/project_node.cc:91 > ExecuteScalarExpression(simplified_expr, target, plan()->exec_context()) > #> /home/nic2/arrow/cpp/src/arrow/compute/exec/exec_plan.cc:573 > iterator_.Next() > #> /home/nic2/arrow/cpp/src/arrow/record_batch.cc:337 ReadNext() > #> /home/nic2/arrow/cpp/src/arrow/record_batch.cc:351 ToRecordBatches() > tbl %>% > mutate(date = as.Date(x)) > #> # A tibble: 1 × 2 > #> x date > #> > #> 1 2022-05-05 00:00:01 2022-05-05 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18427) [C++] Support negative tolerance in `AsofJoinNode`
[ https://issues.apache.org/jira/browse/ARROW-18427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18427: --- Labels: pull-request-available (was: ) > [C++] Support negative tolerance in `AsofJoinNode` > -- > > Key: ARROW-18427 > URL: https://issues.apache.org/jira/browse/ARROW-18427 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Yaron Gvili >Assignee: Yaron Gvili >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, `AsofJoinNode` supports a tolerance that is non-negative, allowing > past-joining, i.e., joining right-table rows with a timestamp at or before > that of the left-table row. This issue will add support for a negative > tolerance, which would allow future-joining too. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-17361) [R] dplyr::summarize fails with division when divisor is a variable
[ https://issues.apache.org/jira/browse/ARROW-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-17361: --- Labels: aggregation dplyr pull-request-available (was: aggregation dplyr) > [R] dplyr::summarize fails with division when divisor is a variable > --- > > Key: ARROW-17361 > URL: https://issues.apache.org/jira/browse/ARROW-17361 > Project: Apache Arrow > Issue Type: Bug > Components: R >Affects Versions: 8.0.0 >Reporter: Oliver Reiter >Priority: Minor > Labels: aggregation, dplyr, pull-request-available > Fix For: 11.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Hello, > I found this odd behaviour when trying to compute an aggregate with > dplyr::summarize: When I want to use a pre-defined variable to do a divison > while aggregating, the execution fails with 'unsupported expression'. When I > the value of the variable as is in the aggregation, it works. > > See below: > > {code:java} > library(dplyr) > library(arrow) > small_dataset <- tibble::tibble( > ## x = rep(c("a", "b"), each = 5), > y = rep(1:5, 2) > ) > ## convert "small_dataset" into a ...dataset > tmpdir <- tempfile() > dir.create(tmpdir) > write_dataset(small_dataset, tmpdir) > ## works > open_dataset(tmpdir) %>% > summarize(value = sum(y) / 10) %>% > collect() > ## fails > scale_factor <- 10 > open_dataset(tmpdir) %>% > summarize(value = sum(y) / scale_factor) %>% > collect() > #> Fehler: Error in summarize_eval(names(exprs)[i], > #> exprs[[i]], ctx, length(.data$group_by_vars) > : > # Expression sum(y)/scale_factor is not an aggregate > # expression or is not supported in Arrow > # Call collect() first to pull data into R. > {code} > I was not sure how to name this issue/bug (if it is one), so if there is a > clearer, more descriptive title you're welcome to adjust. > > Thanks for your work! > > Oliver > > {code:java} > > arrow_info() > Arrow package version: 8.0.0 > Capabilities: > > dataset TRUE > substrait FALSE > parquet TRUE > json TRUE > s3 TRUE > utf8proc TRUE > re2 TRUE > snappy TRUE > gzip TRUE > brotli TRUE > zstd TRUE > lz4 TRUE > lz4_frame TRUE > lzo FALSE > bz2 TRUE > jemalloc TRUE > mimalloc TRUE > Memory: > > Allocator jemalloc > Current 64 bytes > Max 41.25 Kb > Runtime: > > SIMD Level avx2 > Detected SIMD Level avx2 > Build: > > C++ Library Version 8.0.0 > C++ Compiler GNU > C++ Compiler Version 12.1.0 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-17054) [R] Creating an Array from an object bigger than 2^31 results in an Array of length 0
[ https://issues.apache.org/jira/browse/ARROW-17054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-17054: --- Labels: pull-request-available (was: ) > [R] Creating an Array from an object bigger than 2^31 results in an Array of > length 0 > - > > Key: ARROW-17054 > URL: https://issues.apache.org/jira/browse/ARROW-17054 > Project: Apache Arrow > Issue Type: Bug > Components: R >Reporter: Nicola Crane >Priority: Major > Labels: pull-request-available > Fix For: 11.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Apologies for the lack of proper reprex but it crashes my session when I try > to make one. > I'm working on ARROW-16977 which is all about the reporting of object size > having integer overflow issues, but this affects object creation. > {code:r} > library(arrow, warn.conflicts = TRUE) > # works - creates a huge array, hurrah > big_logical <- vector(mode = "logical", length = .Machine$integer.max) > big_logical_array <- Array$create(big_logical) > length(big_logical) > ## [1] 2147483647 > length(big_logical_array) > ## [1] 2147483647 > # creates an array of length 0, boo! > too_big <- vector(mode = "logical", length = .Machine$integer.max + 1) > too_big_array <- Array$create(too_big) > length(too_big) > ## [1] 2147483648 > length(too_big_array) > ## [1] 0 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-17838) [Python] Unify CMakeLists.txt at python/CMakeLists.txt and python/src/CMakeLists.txt
[ https://issues.apache.org/jira/browse/ARROW-17838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-17838: --- Labels: pull-request-available (was: ) > [Python] Unify CMakeLists.txt at python/CMakeLists.txt and > python/src/CMakeLists.txt > > > Key: ARROW-17838 > URL: https://issues.apache.org/jira/browse/ARROW-17838 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-16266) [R] Add StructArray$create()
[ https://issues.apache.org/jira/browse/ARROW-16266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-16266: --- Labels: pull-request-available (was: ) > [R] Add StructArray$create() > > > Key: ARROW-16266 > URL: https://issues.apache.org/jira/browse/ARROW-16266 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Dewey Dunnington >Assignee: Nicola Crane >Priority: Critical > Labels: pull-request-available > Fix For: 11.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > In ARROW-13371 we implemented the {{make_struct}} compute function bound to > {{data.frame()}} / {{tibble()}} in dplyr evaluation; however, we didn't > actually implement {{StructArray$create()}}. In ARROW-15168, it turns out > that we need to do this to support {{StructArray}} creation from data.frames > whose columns aren't all convertable using the internal C++ conversion. The > hack used in that PR is below (but we should clearly implement the C++ > function instead of using the hack): > {code:R} > library(arrow, warn.conflicts = FALSE) > struct_array <- function(...) { > batch <- record_batch(...) > array_ptr <- arrow:::allocate_arrow_array() > schema_ptr <- arrow:::allocate_arrow_schema() > batch$export_to_c(array_ptr, schema_ptr) > Array$import_from_c(array_ptr, schema_ptr) > } > struct_array(a = 1, b = "two") > #> StructArray > #> > > #> -- is_valid: all not null > #> -- child 0 type: double > #> [ > #> 1 > #> ] > #> -- child 1 type: string > #> [ > #> "two" > #> ] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18401) [R] Failing test on test-r-rhub-ubuntu-gcc-release-latest
[ https://issues.apache.org/jira/browse/ARROW-18401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18401: --- Labels: pull-request-available (was: ) > [R] Failing test on test-r-rhub-ubuntu-gcc-release-latest > - > > Key: ARROW-18401 > URL: https://issues.apache.org/jira/browse/ARROW-18401 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Dewey Dunnington >Assignee: Dewey Dunnington >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > I think this is an R problem where there is a string that is not getting > converted to a timestamp (given that the kernel that's mentioned that doesn't > exist probably doesn't and shouldn't exist). > https://dev.azure.com/ursacomputing/crossbow/_build/results?buildId=40090=logs=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb=d9b15392-e4ce-5e4c-0c8c-b69645229181=22256 > {code:java} > ══ Failed tests > > ── Error ('test-dplyr-query.R:694'): Scalars in expressions match the type of > the field, if possible ── > Error: NotImplemented: Function 'greater' has no kernel matching input types > (timestamp[us, tz=UTC], string) > Backtrace: > ▆ > 1. ├─testthat::expect_output(...) at test-dplyr-query.R:694:2 > 2. │ └─testthat:::quasi_capture(...) > 3. │ ├─testthat (local) .capture(...) > 4. │ │ └─testthat::capture_output_lines(code, print, width = width) > 5. │ │ └─testthat:::eval_with_output(code, print = print, width = width) > 6. │ │ ├─withr::with_output_sink(path, withVisible(code)) > 7. │ │ │ └─base::force(code) > 8. │ │ └─base::withVisible(code) > 9. │ └─rlang::eval_bare(quo_get_expr(.quo), quo_get_env(.quo)) > 10. ├─tab %>% filter(times > "2018-10-07 19:04:05") %>% ... > 11. └─arrow::show_exec_plan(.) > 12. ├─arrow::as_record_batch_reader(adq) > 13. └─arrow:::as_record_batch_reader.arrow_dplyr_query(adq) > 14. └─plan$Build(x) > 15. └─node$Filter(.data$filtered_rows) > 16. ├─self$preserve_extras(ExecNode_Filter(self, expr)) > 17. └─arrow:::ExecNode_Filter(self, expr) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18429) [R] Bump dev version following 10.0.1 patch release
[ https://issues.apache.org/jira/browse/ARROW-18429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18429: --- Labels: pull-request-available (was: ) > [R] Bump dev version following 10.0.1 patch release > --- > > Key: ARROW-18429 > URL: https://issues.apache.org/jira/browse/ARROW-18429 > Project: Apache Arrow > Issue Type: Bug > Components: Continuous Integration, R >Reporter: Nicola Crane >Assignee: Nicola Crane >Priority: Major > Labels: pull-request-available > Fix For: 11.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > CI job fails with: > {code:java} >Insufficient package version (submitted: 10.0.0.9000, existing: 10.0.1) > Version contains large components (10.0.0.9000) > {code} > https://github.com/apache/arrow/actions/runs/3639669477/jobs/6145488845#step:10:567 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18320) [C++] Flight client may crash due to improper Result/Status conversion
[ https://issues.apache.org/jira/browse/ARROW-18320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18320: --- Labels: pull-request-available (was: ) > [C++] Flight client may crash due to improper Result/Status conversion > -- > > Key: ARROW-18320 > URL: https://issues.apache.org/jira/browse/ARROW-18320 > Project: Apache Arrow > Issue Type: Bug > Components: C++, FlightRPC >Affects Versions: 6.0.0 >Reporter: David Li >Assignee: David Li >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Reported on user@ > https://lists.apache.org/thread/84z329t1djhnbr5bq936v4hr8cyngj2l > {noformat} > I have an issue on my project, we have a query execution engine that > returns result data as a flight stream and c++ client that receives the > stream. In case a query has no results but the result schema implies > dictionary encoded fields in results we have client app crushed. > The cause is in cpp/src/arrow/flight/client.cc:461: > ::arrow::Result> ReadNextMessage() override { > if (stream_finished_) { > return nullptr; > } > internal::FlightData* data; > { > auto guard = read_mutex_ ? std::unique_lock(*read_mutex_) > : std::unique_lock(); > peekable_reader_->Next(); > } > if (!data) { > stream_finished_ = true; > return stream_->Finish(Status::OK()); // Here the issue > } > // Validate IPC message > auto result = data->OpenMessage(); > if (!result.ok()) { > return stream_->Finish(std::move(result).status()); > } > *app_metadata_ = std::move(data->app_metadata); > return result; > } > The method returns Result object while stream_Finish(..) returns a Status. > So there is an implicit conversion from Status to Result that causes > Result(Status) constructor to be called, but the constructor expects only > error statuses which in turn causes the app to be failed: > /// Constructs a Result object with the given non-OK Status object. All > /// calls to ValueOrDie() on this object will abort. The given `status` must > /// not be an OK status, otherwise this constructor will abort. > /// > /// This constructor is not declared explicit so that a function with a > return > /// type of `Result` can return a Status object, and the status will be > /// implicitly converted to the appropriate return type as a matter of > /// convenience. > /// > /// \param status The non-OK Status object to initialize to. > Result(const Status& status) noexcept // NOLINT(runtime/explicit) > : status_(status) { > if (ARROW_PREDICT_FALSE(status.ok())) { > internal::DieWithMessage(std::string("Constructed with a non-error status: ") > + > status.ToString()); > } > } > Is there a way to workaround or fix it? We use Arrow 6.0.0, but it seems > that the issue exists in all future versions. > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18424) [C++] Fix Doxygen error on `arrow::engine::ConversionStrictness`
[ https://issues.apache.org/jira/browse/ARROW-18424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18424: --- Labels: pull-request-available (was: ) > [C++] Fix Doxygen error on `arrow::engine::ConversionStrictness` > > > Key: ARROW-18424 > URL: https://issues.apache.org/jira/browse/ARROW-18424 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Yaron Gvili >Assignee: Yaron Gvili >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Doxygen is hitting the following error: > `/arrow/cpp/src/arrow/engine/substrait/options.h:37: error: documented symbol > 'enum ARROW_ENGINE_EXPORT arrow::engine::arrow::engine::ConversionStrictness' > was not declared or defined. (warning treated as error, aborting now)`. See > [this CI job > output|https://github.com/apache/arrow/actions/runs/3557712768/jobs/5975904381], > for example. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18281) [C++][Python] Support start == stop in list_slice kernel
[ https://issues.apache.org/jira/browse/ARROW-18281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18281: --- Labels: C++ Python pull-request-available (was: C++ Python) > [C++][Python] Support start == stop in list_slice kernel > > > Key: ARROW-18281 > URL: https://issues.apache.org/jira/browse/ARROW-18281 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Miles Granger >Assignee: Miles Granger >Priority: Major > Labels: C++, Python, pull-request-available > Fix For: 11.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > [GitHub PR 14395 | https://github.com/apache/arrow/pull/14395] adds the > {{list_slice}} kernel, but does not implement the case where {{stop == > stop}}, which should return empty lists. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18423) [Python] Expose reading a schema from an IPC message
[ https://issues.apache.org/jira/browse/ARROW-18423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18423: --- Labels: pull-request-available (was: ) > [Python] Expose reading a schema from an IPC message > > > Key: ARROW-18423 > URL: https://issues.apache.org/jira/browse/ARROW-18423 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Andre Kohn >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Pyarrow currently does not implement reading the Arrow schema from an IPC > message. > [https://github.com/apache/arrow/blob/80b389efe902af376a85a8b3740e0dbdc5f80900/python/pyarrow/ipc.pxi#L1094] > > We'd like to consume Arrow IPC stream data like the following: > > {code:java} > schema_msg = pyarrow.ipc.read_message(result_iter.next().data) > schema = pyarrow.ipc.read_schema(schema_msg) > for batch_data in result_iter: > batch_msg = pyarrow.ipc.read_message(batch_data.data) > batch = pyarrow.ipc.read_record_batch(batch_msg, schema){code} > > The associated (tiny) PR on GitHub implements this reading by binding the > existing C++ function. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18422) [C++] Provide enum reflection utility
[ https://issues.apache.org/jira/browse/ARROW-18422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18422: --- Labels: pull-request-available (was: ) > [C++] Provide enum reflection utility > - > > Key: ARROW-18422 > URL: https://issues.apache.org/jira/browse/ARROW-18422 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Ben Kietzman >Assignee: Ben Kietzman >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Now that we have c++17, we could try again with ARROW-13296 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-17374) [R] R Arrow install fails with SNAPPY_LIB-NOTFOUND
[ https://issues.apache.org/jira/browse/ARROW-17374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-17374: --- Labels: pull-request-available (was: ) > [R] R Arrow install fails with SNAPPY_LIB-NOTFOUND > -- > > Key: ARROW-17374 > URL: https://issues.apache.org/jira/browse/ARROW-17374 > Project: Apache Arrow > Issue Type: Bug > Components: R >Affects Versions: 8.0.0, 8.0.1, 9.0.0 > Environment: Amazon Linux 2 (RHEL) - 5.10.102-99.473.amzn2.x86_64 >Reporter: Shane Brennan >Priority: Blocker > Labels: pull-request-available > Attachments: build-images.out, environment.yml > > Time Spent: 10m > Remaining Estimate: 0h > > I've been trying to install Arrow on an R notebook within AWS SageMaker. > SageMaker provides Jupyter-like notebooks, with each instance running Amazon > Linux 2 as its OS, itself based on RHEL. > Trying to install a few ways, e.g., using the standard binaries, using the > nightly builds, setting ARROW_WITH_SNAPPY to ON and LIBARROW_MINIMAL all > still result in the following error. > {noformat} > x86_64-conda-linux-gnu-c++ -std=gnu++11 -shared > -L/home/ec2-user/anaconda3/envs/R/lib/R/lib -Wl,-O2 -Wl,--sort-common > -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags > -Wl,--gc-sections -Wl,--allow-shlib-undefined > -Wl,-rpath,/home/ec2-user/anaconda3/envs/R/lib > -Wl,-rpath-link,/home/ec2-user/anaconda3/envs/R/lib > -L/home/ec2-user/anaconda3/envs/R/lib -o arrow.so RTasks.o altrep.o array.o > array_to_vector.o arraydata.o arrowExports.o bridge.o buffer.o chunkedarray.o > compression.o compute-exec.o compute.o config.o csv.o dataset.o datatype.o > expression.o extension-impl.o feather.o field.o filesystem.o imports.o io.o > json.o memorypool.o message.o parquet.o r_to_arrow.o recordbatch.o > recordbatchreader.o recordbatchwriter.o safe-call-into-r-impl.o scalar.o > schema.o symbols.o table.o threadpool.o type_infer.o > -L/tmp/Rtmpuh87oc/R.INSTALL67114493a3de/arrow/libarrow/arrow-9.0.0.20220809/lib > -larrow_dataset -lparquet -larrow -larrow_bundled_dependencies -lz > SNAPPY_LIB-NOTFOUND /home/ec2-user/anaconda3/envs/R/lib/libbz2.so -pthread > -larrow -larrow_bundled_dependencies -larrow_dataset -lparquet -lssl -lcrypto > -lcurl -lssl -lcrypto -lcurl -L/home/ec2-user/anaconda3/envs/R/lib/R/lib -lR > x86_64-conda-linux-gnu-c++: error: SNAPPY_LIB-NOTFOUND: No such file or > directory > make: *** [/home/ec2-user/anaconda3/envs/R/lib/R/share/make/shlib.mk:10: > arrow.so] Error 1{noformat} > Snappy is installed on the systems, and both shared object (.so) and cmake > files are there, where I've tried setting the system env variables Snappy_DIR > and Snappy_LIB to point at them, but to no avail. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18419) [C++] Update vendored fast_float
[ https://issues.apache.org/jira/browse/ARROW-18419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18419: --- Labels: pull-request-available (was: ) > [C++] Update vendored fast_float > > > Key: ARROW-18419 > URL: https://issues.apache.org/jira/browse/ARROW-18419 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > For https://github.com/fastfloat/fast_float/pull/147 . -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18333) [Go][Docs] Add and Update compute function docs
[ https://issues.apache.org/jira/browse/ARROW-18333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18333: --- Labels: pull-request-available (was: ) > [Go][Docs] Add and Update compute function docs > --- > > Key: ARROW-18333 > URL: https://issues.apache.org/jira/browse/ARROW-18333 > Project: Apache Arrow > Issue Type: Sub-task > Components: Go >Reporter: Matthew Topol >Assignee: Matthew Topol >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18421) [C++][ORC] Add accessor for number of rows by stripe in reader
[ https://issues.apache.org/jira/browse/ARROW-18421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18421: --- Labels: pull-request-available (was: ) > [C++][ORC] Add accessor for number of rows by stripe in reader > -- > > Key: ARROW-18421 > URL: https://issues.apache.org/jira/browse/ARROW-18421 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Louis Calot >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > I need to have the number of rows by stripe to be able to read specific > ranges of records in the ORC file without reading it all. The number of rows > was already stored in the implementation but not available in the API. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18420) [C++][Parquet] Introduce ColumnIndex and OffsetIndex
[ https://issues.apache.org/jira/browse/ARROW-18420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18420: --- Labels: pull-request-available (was: ) > [C++][Parquet] Introduce ColumnIndex and OffsetIndex > > > Key: ARROW-18420 > URL: https://issues.apache.org/jira/browse/ARROW-18420 > Project: Apache Arrow > Issue Type: Sub-task > Components: C++, Parquet >Reporter: Gang Wu >Assignee: Gang Wu >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Define interface of ColumnIndex and OffsetIndex and provide implementation to > read from serialized form. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18391) [R] Fix the version selector dropdown
[ https://issues.apache.org/jira/browse/ARROW-18391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18391: --- Labels: pull-request-available (was: ) > [R] Fix the version selector dropdown > - > > Key: ARROW-18391 > URL: https://issues.apache.org/jira/browse/ARROW-18391 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Nicola Crane >Assignee: Nicola Crane >Priority: Major > Labels: pull-request-available > Fix For: 11.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > ARROW-17887 updates the docs to use Bootstrap 5 which will break the docs > version dropdown selector, as it relies on replacing a page element, but the > page elements are different in this version of Bootstrap. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18417) [C++] Support emit info in Substrait extension-multi and AsOfJoin
[ https://issues.apache.org/jira/browse/ARROW-18417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18417: --- Labels: pull-request-available (was: ) > [C++] Support emit info in Substrait extension-multi and AsOfJoin > - > > Key: ARROW-18417 > URL: https://issues.apache.org/jira/browse/ARROW-18417 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Yaron Gvili >Assignee: Yaron Gvili >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, Arrow-Substrait does not handle emit info that may appear in an > extension-multi in a Substrait plan. Besides the generic handling in the > Arrow-Substrait extension API, specific handling for AsOfJoin is required, > because AsOfJoinNode produces an output schema that is different than the one > used in the emit info. In particular, the AsOfJoinNode output scheme does not > include on- and by-keys of right tables. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18112) [Go] Remaining Scalar Unary Arithmetic (sin/cos/etc. rounding, log/ln, etc.)
[ https://issues.apache.org/jira/browse/ARROW-18112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18112: --- Labels: pull-request-available (was: ) > [Go] Remaining Scalar Unary Arithmetic (sin/cos/etc. rounding, log/ln, etc.) > > > Key: ARROW-18112 > URL: https://issues.apache.org/jira/browse/ARROW-18112 > Project: Apache Arrow > Issue Type: Sub-task > Components: Go >Reporter: Matthew Topol >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18412) [R] Windows build fails because of missing ChunkResolver symbols
[ https://issues.apache.org/jira/browse/ARROW-18412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18412: --- Labels: pull-request-available (was: ) > [R] Windows build fails because of missing ChunkResolver symbols > > > Key: ARROW-18412 > URL: https://issues.apache.org/jira/browse/ARROW-18412 > Project: Apache Arrow > Issue Type: Bug > Components: R >Reporter: Dewey Dunnington >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > In recent nightly builds of the Windows package we have a build failure > because some symbols related to the {{ChunkResolver}} are not found in the > linking stage. > https://github.com/ursacomputing/crossbow/actions/runs/3559717769/jobs/5979255297#step:9:2818 > [~kou] suggested the following patch might fix the build: > https://github.com/apache/arrow/pull/14530#issuecomment-1328341447 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18416) [R] Update NEWS for 10.0.1
[ https://issues.apache.org/jira/browse/ARROW-18416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18416: --- Labels: pull-request-available (was: ) > [R] Update NEWS for 10.0.1 > -- > > Key: ARROW-18416 > URL: https://issues.apache.org/jira/browse/ARROW-18416 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Nicola Crane >Assignee: Nicola Crane >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18402) [C++] Expose `DeclarationInfo`
[ https://issues.apache.org/jira/browse/ARROW-18402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18402: --- Labels: pull-request-available (was: ) > [C++] Expose `DeclarationInfo` > -- > > Key: ARROW-18402 > URL: https://issues.apache.org/jira/browse/ARROW-18402 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Yaron Gvili >Assignee: Yaron Gvili >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > `DeclarationInfo` is just a pair of `Declaration` and `Schema`, which are > public APIs, and so can be made public API itself. This can be part of or a > follow-up on [https://github.com/apache/arrow/pull/14485], and will allow > implementing extension providers, whose API depends on `DeclarationInfo`, > outside of the Arrow repo. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18123) [Python] Cannot use multi-byte characters in file names in write_table
[ https://issues.apache.org/jira/browse/ARROW-18123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18123: --- Labels: pull-request-available (was: ) > [Python] Cannot use multi-byte characters in file names in write_table > -- > > Key: ARROW-18123 > URL: https://issues.apache.org/jira/browse/ARROW-18123 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 9.0.0 >Reporter: SHIMA Tatsuya >Assignee: Miles Granger >Priority: Critical > Labels: pull-request-available > Fix For: 11.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Error when specifying a file path containing multi-byte characters in > {{pyarrow.parquet.write_table}}. > For example, use {{例.parquet}} as the file path. > {code:python} > Python 3.10.7 (main, Oct 5 2022, 14:33:54) [GCC 10.2.1 20210110] on linux > Type "help", "copyright", "credits" or "license" for more information. > >>> import pandas as pd > >>> import numpy as np > >>> import pyarrow as pa > >>> df = pd.DataFrame({'one': [-1, np.nan, 2.5], > ...'two': ['foo', 'bar', 'baz'], > ...'three': [True, False, True]}, > ...index=list('abc')) > >>> table = pa.Table.from_pandas(df) > >>> import pyarrow.parquet as pq > >>> pq.write_table(table, '例.parquet') > Traceback (most recent call last): > File "", line 1, in > File > "/home/vscode/.local/lib/python3.10/site-packages/pyarrow/parquet/__init__.py", > line 2920, in write_table > with ParquetWriter( > File > "/home/vscode/.local/lib/python3.10/site-packages/pyarrow/parquet/__init__.py", > line 911, in __init__ > filesystem, path = _resolve_filesystem_and_path( > File "/home/vscode/.local/lib/python3.10/site-packages/pyarrow/fs.py", line > 184, in _resolve_filesystem_and_path > filesystem, path = FileSystem.from_uri(path) > File "pyarrow/_fs.pyx", line 463, in pyarrow._fs.FileSystem.from_uri > File "pyarrow/error.pxi", line 144, in > pyarrow.lib.pyarrow_internal_check_status > File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status > pyarrow.lib.ArrowInvalid: Cannot parse URI: '例.parquet' > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18119) [C++] Utility method to ensure an array object meetings an alignment requirement
[ https://issues.apache.org/jira/browse/ARROW-18119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18119: --- Labels: pull-request-available (was: ) > [C++] Utility method to ensure an array object meetings an alignment > requirement > > > Key: ARROW-18119 > URL: https://issues.apache.org/jira/browse/ARROW-18119 > Project: Apache Arrow > Issue Type: Sub-task > Components: C++ >Reporter: Weston Pace >Assignee: Sanjiban Sengupta >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > This would look something like: > EnsureAligned(Buffer|Array|ChunkedArray|RecordBatch|Table, int > minimum_alignment, MemoryPool* memory_pool); > It would fail if MemoryPool's alignment < minimum_alignment > It would iterate through each buffer of the object, if the object is not > aligned properly, it would reallocate and copy the buffer (using memory_pool) > It would return a new object where every buffer is guaranteed to meet the > alignment requirements. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18413) [C++][Parquet] FileMetaData exposes page index metadata
[ https://issues.apache.org/jira/browse/ARROW-18413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18413: --- Labels: pull-request-available (was: ) > [C++][Parquet] FileMetaData exposes page index metadata > --- > > Key: ARROW-18413 > URL: https://issues.apache.org/jira/browse/ARROW-18413 > Project: Apache Arrow > Issue Type: Sub-task > Components: C++, Parquet >Reporter: Gang Wu >Assignee: Gang Wu >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Parquet ColumnChunk thrift object has recorded metadata for page index: > [parquet-format/parquet.thrift at master · apache/parquet-format > (github.com)|https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L799] > We just need to add public API to ColumnChunkMetaData to make it ready to > read. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18395) [C++] Move select-k implementation into separate module
[ https://issues.apache.org/jira/browse/ARROW-18395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18395: --- Labels: good-second-issue pull-request-available (was: good-second-issue) > [C++] Move select-k implementation into separate module > --- > > Key: ARROW-18395 > URL: https://issues.apache.org/jira/browse/ARROW-18395 > Project: Apache Arrow > Issue Type: Task > Components: C++ >Reporter: Antoine Pitrou >Assignee: Ben Harkins >Priority: Minor > Labels: good-second-issue, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The select-k kernel implementations are currently in {{vector_sort.cc}}, > amongst other things. > To make the code more readable and faster to compiler, we should move them > into their own file. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18316) [CI] Migrate jobs on Travis CI to dev/tasks/
[ https://issues.apache.org/jira/browse/ARROW-18316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18316: --- Labels: pull-request-available (was: ) > [CI] Migrate jobs on Travis CI to dev/tasks/ > > > Key: ARROW-18316 > URL: https://issues.apache.org/jira/browse/ARROW-18316 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > https://cwiki.apache.org/confluence/display/INFRA/Travis+Migrations > {quote} > On November 2nd, 2020, Travis-CI announced the end of unlimited support for > open source projects. > Infra is therefore moving our CI offerings away from Travis-CI in order to > keep our builds pipeline cost-effective > Deadline: December 31st 2022 > Infrastructure is moving ASF projects away from using Travis-CI at the end of > 2022. > {quote} > We need to migrate jobs in {{.travis.yml}} to {{dev/tasks/}}. {{dev/tasks/}} > are triggered by Crossbow. And Crossbow used in apache/arrow ( > https://github.com/ursacomputing/crossbow/ ) can still use Travis CI ( > https://app.travis-ci.com/github/ursacomputing/crossbow ) because the Travis > CI account is sponsored by Voltron Data not ASF. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18280) [C++][Python] Support slicing to arbitrary end in list_slice kernel
[ https://issues.apache.org/jira/browse/ARROW-18280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18280: --- Labels: pull-request-available (was: ) > [C++][Python] Support slicing to arbitrary end in list_slice kernel > --- > > Key: ARROW-18280 > URL: https://issues.apache.org/jira/browse/ARROW-18280 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Python >Reporter: Miles Granger >Assignee: Miles Granger >Priority: Major > Labels: pull-request-available > Fix For: 11.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > [GitHub PR | https://github.com/apache/arrow/pull/14395] adds support for > {{list_slice}} kernel, but does not implement what to do when {{stop == > std::nullopt}}, which should slice to the end of the list elements. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-10158) [C++] Add support Parquet with Page Index
[ https://issues.apache.org/jira/browse/ARROW-10158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-10158: --- Labels: pull-request-available (was: ) > [C++] Add support Parquet with Page Index > - > > Key: ARROW-10158 > URL: https://issues.apache.org/jira/browse/ARROW-10158 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Parquet >Reporter: Malthe Borch >Assignee: Gang Wu >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > A recent parquet-format release also added support for a [Page Index > |https://github.com/apache/parquet-format/blob/96a8f3172a3b895408d2d1b939200dd02ab8300d/PageIndex.md] > making it possible to skip pages within a RowGroup. > This should be implemented by Apache Arrow. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18106) [C++] JSON reader ignores explicit schema with default unexpected_field_behavior="infer"
[ https://issues.apache.org/jira/browse/ARROW-18106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18106: --- Labels: json pull-request-available (was: json) > [C++] JSON reader ignores explicit schema with default > unexpected_field_behavior="infer" > > > Key: ARROW-18106 > URL: https://issues.apache.org/jira/browse/ARROW-18106 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Joris Van den Bossche >Assignee: Ben Harkins >Priority: Major > Labels: json, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Not 100% sure this is a "bug", but at least I find it an unexpected interplay > between two options. > By default, when reading json, we _infer_ the data type of columns, and when > specifying an explicit schema, we _also_ by default infer the type of columns > that are not specified in the explicit schema. The docs for > {{unexpected_field_behavior}}: > > How JSON fields outside of explicit_schema (if given) are treated > But it seems that if you specify a schema, and the parsing of one of the > columns fails according to that schema, we still fall back to this default of > inferring the data type (while I would have expected an error, since we > should only infer for columns _not_ in the schema. > Example code using pyarrow: > {code:python} > import io > import pyarrow as pa > from pyarrow import json > s_json = """{"column":"2022-09-05T08:08:46.000"}""" > opts = json.ParseOptions(explicit_schema=pa.schema([("column", > pa.timestamp("s"))])) > json.read_json(io.BytesIO(s_json.encode()), parse_options=opts) > {code} > The parsing fails here because there are milliseconds and the type is "s", > but the explicit schema is ignored, and we get a result with a string column > as result: > {code} > pyarrow.Table > column: string > > column: [["2022-09-05T08:08:46.000"]] > {code} > But when adding {{unexpected_field_behaviour="ignore"}}, we actually get the > expected parse error: > {code:python} > opts = json.ParseOptions(explicit_schema=pa.schema([("column", > pa.timestamp("s"))]), unexpected_field_behavior="ignore") > json.read_json(io.BytesIO(s_json.encode()), parse_options=opts) > {code} > gives > {code} > ArrowInvalid: Failed of conversion of JSON to timestamp[s], couldn't > parse:2022-09-05T08:08:46.000 > {code} > It might be this is specific to timestamps, I don't directly see a similar > issue with eg {{"column": "A"}} and setting the schema to "column" being > int64. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18410) [Packaging][Ubuntu] Add support for Ubuntu 22.10
[ https://issues.apache.org/jira/browse/ARROW-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18410: --- Labels: pull-request-available (was: ) > [Packaging][Ubuntu] Add support for Ubuntu 22.10 > > > Key: ARROW-18410 > URL: https://issues.apache.org/jira/browse/ARROW-18410 > Project: Apache Arrow > Issue Type: Improvement > Components: Packaging >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18409) [GLib][Plasma] Suppress deprecated warning in building plasma-glib
[ https://issues.apache.org/jira/browse/ARROW-18409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18409: --- Labels: pull-request-available (was: ) > [GLib][Plasma] Suppress deprecated warning in building plasma-glib > -- > > Key: ARROW-18409 > URL: https://issues.apache.org/jira/browse/ARROW-18409 > Project: Apache Arrow > Issue Type: Improvement > Components: GLib >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > If we always get "Plasma is deprecated since Arrow 10.0.0. ..." warning from > {{plasma/common.h}}, we can't use {{-Dwerror=true}} Meson option with > plama-glib. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18405) [Ruby] Raw table converter rebuilds chunked arrays
[ https://issues.apache.org/jira/browse/ARROW-18405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18405: --- Labels: pull-request-available (was: ) > [Ruby] Raw table converter rebuilds chunked arrays > -- > > Key: ARROW-18405 > URL: https://issues.apache.org/jira/browse/ARROW-18405 > Project: Apache Arrow > Issue Type: Bug > Components: Ruby >Affects Versions: 10.0.0 >Reporter: Sten Larsson >Assignee: Kouhei Sutou >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Consider the following Ruby script: > {code:ruby} > require 'arrow' > data = Arrow::ChunkedArray.new([Arrow::Int64Array.new([1])]) > table = Arrow::Table.new('column' => data) > puts table['column'].data_type > {code} > This prints "int64" with red-arrow 9.0.0 and "uint8" in 10.0.0. > From my understanding it is due to this commit: > [https://github.com/apache/arrow/commit/913d9c0a9a1a4398ed5f56d713d586770b4f702c#diff-f7f19bbc3945ea30ba06d851705f2d58f7666507bb101c4e151014ca398bd635R42] > The old version would not call ArrayBuilder.build on a ChunkedArray, but the > new version does. This is a problem for us, because we need the column to > stay int64. > A workaround is to specify a schema and list of arrays instead to bypass the > raw table converter: > {code:ruby} > table = Arrow::Table.new([{name: 'column', type: 'int64'}], [data]) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18407) [Release][Website] Use UTC for release date
[ https://issues.apache.org/jira/browse/ARROW-18407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18407: --- Labels: pull-request-available (was: ) > [Release][Website] Use UTC for release date > --- > > Key: ARROW-18407 > URL: https://issues.apache.org/jira/browse/ARROW-18407 > Project: Apache Arrow > Issue Type: Improvement > Components: Developer Tools, Website >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18406) [C++] Can't build Arrow with Substrait on Ubuntu 20.04
[ https://issues.apache.org/jira/browse/ARROW-18406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18406: --- Labels: pull-request-available (was: ) > [C++] Can't build Arrow with Substrait on Ubuntu 20.04 > -- > > Key: ARROW-18406 > URL: https://issues.apache.org/jira/browse/ARROW-18406 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Dewey Dunnington >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > I recently tried to rebuild Arrow with Substrait on Ubuntu 20.04 and got the > following error: > {code:java} > [100%] Building CXX object > src/arrow/engine/CMakeFiles/arrow_substrait_objlib.dir/substrait/type_internal.cc.o > /home/dewey/Desktop/r/arrow/cpp/src/arrow/engine/substrait/expression_internal.cc: > In function ‘arrow::Status arrow::engine::DecodeArg(const > substrait::FunctionArgument&, int, arrow::engine::SubstraitCall*, const > arrow::engine::ExtensionSet&, const arrow::engine::ConversionOptions&)’: > /home/dewey/Desktop/r/arrow/cpp/src/arrow/engine/substrait/expression_internal.cc:60:21: > error: ‘bool substrait::FunctionArgument::has_enum_() const’ is private > within this context >60 | if (arg.has_enum_()) { > | ^ > In file included from > /home/dewey/Desktop/r/arrow/cpp/src/arrow/engine/substrait/expression_internal.h:30, > from > /home/dewey/Desktop/r/arrow/cpp/src/arrow/engine/substrait/expression_internal.cc:20: > /home/dewey/.r-arrow-dev-build/build/substrait_ep-generated/substrait/algebra.pb.h:21690:13: > note: declared private here > 21690 | inline bool FunctionArgument::has_enum_() const { > | ^~~~ > [100%] Building CXX object > src/arrow/engine/CMakeFiles/arrow_substrait_objlib.dir/substrait/util.cc.o > make[2]: *** > [src/arrow/engine/CMakeFiles/arrow_substrait_objlib.dir/build.make:76: > src/arrow/engine/CMakeFiles/arrow_substrait_objlib.dir/substrait/expression_internal.cc.o] > Error 1 > make[2]: *** Waiting for unfinished jobs > make[1]: *** [CMakeFiles/Makefile2:2028: > src/arrow/engine/CMakeFiles/arrow_substrait_objlib.dir/all] Error 2 > make: *** [Makefile:146: all] Error 2 > {code} > [~westonpace] suggested that it is probably a protobuf version problem! For > me this is: > {code:java} > $ protoc --version > libprotoc 3.6.1 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18380) MIGRATION: Enable bot handling of GitHub issue linked PRs
[ https://issues.apache.org/jira/browse/ARROW-18380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18380: --- Labels: pull-request-available (was: ) > MIGRATION: Enable bot handling of GitHub issue linked PRs > - > > Key: ARROW-18380 > URL: https://issues.apache.org/jira/browse/ARROW-18380 > Project: Apache Arrow > Issue Type: Task > Components: Developer Tools >Reporter: Todd Farmer >Assignee: Raúl Cumplido >Priority: Major > Labels: pull-request-available > Fix For: 11.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > GitHub workflows for the Apache Arrow project assume that PRs reference ASF > Jira issues (or are minor changes). This needs to be revisited now that > GitHub issue reporting is enabled, as there may well be no ASF Jira issue to > link a PR against going forward. The resulting bot comments can be confusing. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18399) [Python] Reduce warnings during tests
[ https://issues.apache.org/jira/browse/ARROW-18399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18399: --- Labels: pull-request-available (was: ) > [Python] Reduce warnings during tests > - > > Key: ARROW-18399 > URL: https://issues.apache.org/jira/browse/ARROW-18399 > Project: Apache Arrow > Issue Type: Task > Components: Python >Reporter: Antoine Pitrou >Assignee: Miles Granger >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Numerous warnings are displayed at the end of a test run, we should strive > them to reduce them: > https://github.com/apache/arrow/actions/runs/3533792571/jobs/5929880345#step:6:5489 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18113) Implement a read range process without caching
[ https://issues.apache.org/jira/browse/ARROW-18113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18113: --- Labels: pull-request-available (was: ) > Implement a read range process without caching > -- > > Key: ARROW-18113 > URL: https://issues.apache.org/jira/browse/ARROW-18113 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Percy Camilo Triveño Aucahuasi >Assignee: Percy Camilo Triveño Aucahuasi >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The current > [ReadRangeCache|https://github.com/apache/arrow/blob/e06e98db356e602212019cfbae83fd3d5347292d/cpp/src/arrow/io/caching.h#L100] > is mixing caching with coalescing and making difficult to implement readers > capable to really perform concurrent reads on coalesced data (see this > [github > comment|https://github.com/apache/arrow/pull/14226#discussion_r999334979] for > additional context); for instance, right now the prebuffering feature of > those readers cannot handle concurrent invocations. > The goal for this ticket is to implement a similar component to > ReadRangeCache for performing non-cache reads (doing only the coalescing part > instead). So, once we have that new capability, we can port the parquet and > IPC readers to this new component and keep improving the reading process > (that would be part of other set of follow-up tickets). Similar ideas were > mentioned here https://issues.apache.org/jira/browse/ARROW-17599 > Maybe a good place to implement this new capability is inside the file system > abstraction (as part of a dedicated method to read coalesced data) and where > the abstract file system can provide a default implementation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18397) [C++] Clear S3 region resolver client at S3 shutdown
[ https://issues.apache.org/jira/browse/ARROW-18397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18397: --- Labels: pull-request-available (was: ) > [C++] Clear S3 region resolver client at S3 shutdown > > > Key: ARROW-18397 > URL: https://issues.apache.org/jira/browse/ARROW-18397 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 10.0.2, 11.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > The S3 region resolver caches a S3 client at module scope. This client can be > destroyed very late and trigger an assertion error in the AWS SDK because it > was already shutdown: > https://github.com/aws/aws-sdk-cpp/issues/2204 > When explicitly finalizing S3, we should ensure we also destroy the cached S3 > client. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18272) [pyarrow] ParquetFile does not recognize GCS cloud path as a string
[ https://issues.apache.org/jira/browse/ARROW-18272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18272: --- Labels: pull-request-available (was: ) > [pyarrow] ParquetFile does not recognize GCS cloud path as a string > --- > > Key: ARROW-18272 > URL: https://issues.apache.org/jira/browse/ARROW-18272 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 10.0.0 >Reporter: Zepu Zhang >Assignee: Miles Granger >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > I have a Parquet file at > > path = 'gs://mybucket/abc/d.parquet' > > `pyarrow.parquet.read_metadata(path)` works fine. > > `pyarrow.parquet.ParquetFile(path)` raises "Failed to open local file > 'gs://mybucket/abc/d.parquet'. > > Looks like ParquetFile misses the path resolution logic found in > `read_metadata` -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18392) [CI][Python] Some nightly python tests fail due to ACCESS DENIED to S3 bucket
[ https://issues.apache.org/jira/browse/ARROW-18392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18392: --- Labels: Nightly pull-request-available (was: Nightly) > [CI][Python] Some nightly python tests fail due to ACCESS DENIED to S3 bucket > -- > > Key: ARROW-18392 > URL: https://issues.apache.org/jira/browse/ARROW-18392 > Project: Apache Arrow > Issue Type: Bug > Components: Continuous Integration, Python >Reporter: Raúl Cumplido >Assignee: Miles Granger >Priority: Critical > Labels: Nightly, pull-request-available > Fix For: 11.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Several nightly tests fail with: > {code:java} > === FAILURES > === > test_s3fs_wrong_region > @pytest.mark.s3 > def test_s3fs_wrong_region(): > from pyarrow.fs import S3FileSystem > > # wrong region for bucket > fs = S3FileSystem(region='eu-north-1') > > msg = ("When getting information for bucket > 'voltrondata-labs-datasets': " > r"AWS Error UNKNOWN \(HTTP status 301\) during HeadBucket " > "operation: No response body. Looks like the configured region > is " > "'eu-north-1' while the bucket is located in 'us-east-2'." > "|NETWORK_CONNECTION") > with pytest.raises(OSError, match=msg) as exc: > fs.get_file_info("voltrondata-labs-datasets") > > # Sometimes fails on unrelated network error, so next call would also > fail. > if 'NETWORK_CONNECTION' in str(exc.value): > return > > fs = S3FileSystem(region='us-east-2') > > > > fs.get_file_info("voltrondata-labs-datasets")opt/conda/envs/arrow/lib/python3.7/site-packages/pyarrow/tests/test_fs.py:1339: > > > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > pyarrow/_fs.pyx:571: in pyarrow._fs.FileSystem.get_file_info > ??? > pyarrow/error.pxi:144: in pyarrow.lib.pyarrow_internal_check_status > ??? > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > ??? > E OSError: When getting information for bucket 'voltrondata-labs-datasets': > AWS Error ACCESS_DENIED during HeadBucket operation: No response body. {code} > I can't seem to be able to reproduce locally but is pretty consistent: > * > [test-conda-python-3.10|https://github.com/ursacomputing/crossbow/actions/runs/3528202639/jobs/5918051269] > * > [test-conda-python-3.11|https://github.com/ursacomputing/crossbow/actions/runs/3528201175/jobs/5918048135] > * > [test-conda-python-3.7|https://github.com/ursacomputing/crossbow/actions/runs/3528195566/jobs/5918035812] > * > [test-conda-python-3.7-pandas-latest|https://github.com/ursacomputing/crossbow/actions/runs/3528211334/jobs/5918069152] > * > [test-conda-python-3.8|https://github.com/ursacomputing/crossbow/actions/runs/3528193702/jobs/5918032370] > * > [test-conda-python-3.8-pandas-latest|https://github.com/ursacomputing/crossbow/actions/runs/3528213536/jobs/5918073481] > * > [test-conda-python-3.8-pandas-nightly|https://github.com/ursacomputing/crossbow/actions/runs/3528205157/jobs/5918056277] > * > [test-conda-python-3.9|https://github.com/ursacomputing/crossbow/actions/runs/3528202402/jobs/5918050613] > * > [test-conda-python-3.9-pandas-upstream_devel|https://github.com/ursacomputing/crossbow/actions/runs/3528210560/jobs/5918067302] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18390) [CI][Python] Nightly python test for spark master missing test module
[ https://issues.apache.org/jira/browse/ARROW-18390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18390: --- Labels: Nightly pull-request-available (was: Nightly) > [CI][Python] Nightly python test for spark master missing test module > - > > Key: ARROW-18390 > URL: https://issues.apache.org/jira/browse/ARROW-18390 > Project: Apache Arrow > Issue Type: Bug > Components: Continuous Integration, Python >Reporter: Raúl Cumplido >Assignee: Raúl Cumplido >Priority: Major > Labels: Nightly, pull-request-available > Fix For: 11.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Currently the nightly test with spark master > [test-conda-python-3.9-spark-master|[https://github.com/ursacomputing/crossbow/actions/runs/3528196313/jobs/5918037939]] > fail with: > {code:java} > Starting test(python): pyspark.sql.tests.test_pandas_map (temp output: > /spark/python/target/cbca1b18-4af7-4205-aa41-8c945bf1cf58/python__pyspark.sql.tests.test_pandas_map__9ptzo8sa.log) > /opt/conda/envs/arrow/bin/python: No module named > pyspark.sql.tests.test_pandas_grouped_map {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18389) [CI][Python] Update nightly test-conda-python-3.7-pandas-0.24 to pandas >= 1.0
[ https://issues.apache.org/jira/browse/ARROW-18389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18389: --- Labels: pull-request-available (was: ) > [CI][Python] Update nightly test-conda-python-3.7-pandas-0.24 to pandas >= 1.0 > -- > > Key: ARROW-18389 > URL: https://issues.apache.org/jira/browse/ARROW-18389 > Project: Apache Arrow > Issue Type: Bug > Components: Continuous Integration, Python >Reporter: Raúl Cumplido >Assignee: Raúl Cumplido >Priority: Major > Labels: pull-request-available > Fix For: 11.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > https://issues.apache.org/jira/browse/ARROW-18173 Removed support for pandas > < 1.0. We should upgrade the nightly test. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18384) [Release][MSYS2] Show pull request title
[ https://issues.apache.org/jira/browse/ARROW-18384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18384: --- Labels: pull-request-available (was: ) > [Release][MSYS2] Show pull request title > > > Key: ARROW-18384 > URL: https://issues.apache.org/jira/browse/ARROW-18384 > Project: Apache Arrow > Issue Type: Improvement > Components: Developer Tools >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18292) [Release][Python] Upload .wheel/.tar.gz for release not RC
[ https://issues.apache.org/jira/browse/ARROW-18292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18292: --- Labels: pull-request-available (was: ) > [Release][Python] Upload .wheel/.tar.gz for release not RC > -- > > Key: ARROW-18292 > URL: https://issues.apache.org/jira/browse/ARROW-18292 > Project: Apache Arrow > Issue Type: Improvement > Components: Developer Tools >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Blocker > Labels: pull-request-available > Fix For: 11.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > {{dev/release/post-09-python.sh}} uploads {{.wheel}}/{{.tar.gz}} for RC ( > https://apache.jfrog.io/ui/native/arrow/python-rc/ ) not release ( > https://apache.jfrog.io/ui/native/arrow/python/ ) . They are the same content > (because we copy artifacts of passed RC to release) but we should upload > {{.wheel}}/{{.tar.gz}} for release to clarify that we use vote passed > artifacts. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18383) [C++] Avoid global variables for thread pools and at-fork handlers
[ https://issues.apache.org/jira/browse/ARROW-18383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18383: --- Labels: pull-request-available (was: ) > [C++] Avoid global variables for thread pools and at-fork handlers > -- > > Key: ARROW-18383 > URL: https://issues.apache.org/jira/browse/ARROW-18383 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 11.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Investigation revealed an issue where the global IO thread pool could be > constructed before the at-fork handler internal state. The IO thread pool, > created on library load, would register an at-fork handler; then, the at-fork > handler state would be initialized and clobber the handler registered just > before. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-15812) [R] Allow user to supply col_names argument when reading in a CSV dataset
[ https://issues.apache.org/jira/browse/ARROW-15812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-15812: --- Labels: pull-request-available (was: ) > [R] Allow user to supply col_names argument when reading in a CSV dataset > - > > Key: ARROW-15812 > URL: https://issues.apache.org/jira/browse/ARROW-15812 > Project: Apache Arrow > Issue Type: Sub-task > Components: R >Reporter: Nicola Crane >Assignee: Will Jones >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Allow the user to supply the {{col_names}} argument from {{readr}} when > reading in a dataset. > This is already possible when reading in a single CSV file via > {{arrow::read_csv_arrow()}} via the {{readr_to_csv_read_options}} function, > and so once the C++ functionality to autogenerate column names for Datasets > is implemented, we should hook up {{readr_to_csv_read_options}} in > {{csv_file_format_read_opts}} just like we have with > {{readr_to_csv_parse_options}} in {{csv_file_format_parse_options}}. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18111) [Go] Remaining Scalar Binary Arithmetic (bitwise, shifts)
[ https://issues.apache.org/jira/browse/ARROW-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18111: --- Labels: pull-request-available (was: ) > [Go] Remaining Scalar Binary Arithmetic (bitwise, shifts) > - > > Key: ARROW-18111 > URL: https://issues.apache.org/jira/browse/ARROW-18111 > Project: Apache Arrow > Issue Type: Sub-task > Components: Go >Reporter: Matthew Topol >Assignee: Matthew Topol >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18382) [C++] "ADDRESS_SANITIZER" not defined in fuzzing builds
[ https://issues.apache.org/jira/browse/ARROW-18382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18382: --- Labels: pull-request-available (was: ) > [C++] "ADDRESS_SANITIZER" not defined in fuzzing builds > --- > > Key: ARROW-18382 > URL: https://issues.apache.org/jira/browse/ARROW-18382 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Fuzzing builds (as run by OSS-Fuzz) enable Address Sanitizer through their > own set of options rather than by enabling {{ARROW_USE_ASAN}}. However, we > need to be informed this situation in the Arrow source code. > One example of where this matters is that eternal thread pools produce > spurious leaks at shutdown because of the vector of at-fork handlers; it > therefore needs to be worked around on those builds. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18360) [Python] Incorrectly passing schema=None to do_put crashes
[ https://issues.apache.org/jira/browse/ARROW-18360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18360: --- Labels: pull-request-available (was: ) > [Python] Incorrectly passing schema=None to do_put crashes > -- > > Key: ARROW-18360 > URL: https://issues.apache.org/jira/browse/ARROW-18360 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 9.0.0 >Reporter: Bryan Cutler >Assignee: Miles Granger >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > In pyarrow.flight, putting an incorrect value of None for schema in do_put > will lead to a core dump. > In pyarrow 9.0.0, trying to enter the command leads to a > {code} > In [3]: writer, reader = > client.do_put(flight.FlightDescriptor.for_command(cmd), schema=None) > Segmentation fault (core dumped) > {code} > In pyarrow 7.0.0, the kernel crashes after attempting to access the writer > and I got the following: > {code} > In [38]: client = flight.FlightClient('grpc+tls://localhost:9643', > disable_server_verification=True) > In [39]: writer, reader = > client.do_put(flight.FlightDescriptor.for_command(cmd), None) > In [40]: > writer./home/conda/feedstock_root/build_artifacts/arrow-cpp-ext_1644752264449/work/cpp/src/arrow/flight/client.cc:736: > Check failed: (batch_writer_) != (nullptr) > miniconda3/envs/dev/lib/python3.10/site-packages/pyarrow/../../../libarrow.so.700(+0x66288c)[0x7f0feeae088c] > miniconda3/envs/dev/lib/python3.10/site-packages/pyarrow/../../../libarrow.so.700(_ZN5arrow4util8ArrowLogD1Ev+0x101)[0x7f0feeae0c91] > miniconda3/envs/dev/lib/python3.10/site-packages/pyarrow/../../../libarrow_flight.so.700(+0x7c1e1)[0x7f0fa9e331e1] > miniconda3/envs/dev/lib/python3.10/site-packages/pyarrow/lib.cpython-310-x86_64-linux-gnu.so(+0x17cf1a)[0x7f0fefe7ff1a] > miniconda3/envs/dev/bin/python(_PyObject_GenericGetAttrWithDict+0x4f3)[0x559a7cb8da03] > miniconda3/envs/dev/bin/python(+0x144814)[0x559a7cb8f814] > miniconda3/envs/dev/bin/python(+0x1445bf)[0x559a7cb8f5bf] > miniconda3/envs/dev/bin/python(_PyEval_EvalFrameDefault+0x30c)[0x559a7cb7ebcc] > miniconda3/envs/dev/bin/python(+0x1516ac)[0x559a7cb9c6ac] > miniconda3/envs/dev/bin/python(PyObject_Call+0xb8)[0x559a7cb9d348] > miniconda3/envs/dev/bin/python(_PyEval_EvalFrameDefault+0x2b05)[0x559a7cb813c5] > miniconda3/envs/dev/bin/python(_PyFunction_Vectorcall+0x6f)[0x559a7cb8f3cf] > miniconda3/envs/dev/bin/python(+0x1ead44)[0x559a7cc35d44] > miniconda3/envs/dev/bin/python(+0x220397)[0x559a7cc6b397] > miniconda3/envs/dev/bin/python(PyObject_Call+0xb8)[0x559a7cb9d348] > miniconda3/envs/dev/bin/python(_PyEval_EvalFrameDefault+0x2b05)[0x559a7cb813c5] > miniconda3/envs/dev/bin/python(_PyFunction_Vectorcall+0x6f)[0x559a7cb8f3cf] > miniconda3/envs/dev/bin/python(PyObject_Call+0xb8)[0x559a7cb9d348] > miniconda3/envs/dev/bin/python(_PyEval_EvalFrameDefault+0x2b05)[0x559a7cb813c5] > miniconda3/envs/dev/bin/python(+0x1516ac)[0x559a7cb9c6ac] > miniconda3/envs/dev/bin/python(PyObject_Call+0xb8)[0x559a7cb9d348] > miniconda3/envs/dev/bin/python(_PyEval_EvalFrameDefault+0x2b05)[0x559a7cb813c5] > miniconda3/envs/dev/bin/python(+0x151ef3)[0x559a7cb9cef3] > miniconda3/envs/dev/bin/python(+0x1ead44)[0x559a7cc35d44] > miniconda3/envs/dev/bin/python(+0x220397)[0x559a7cc6b397] > miniconda3/envs/dev/bin/python(_PyEval_EvalFrameDefault+0x1311)[0x559a7cb7fbd1] > miniconda3/envs/dev/bin/python(_PyFunction_Vectorcall+0x6f)[0x559a7cb8f3cf] > miniconda3/envs/dev/bin/python(_PyEval_EvalFrameDefault+0x30c)[0x559a7cb7ebcc] > miniconda3/envs/dev/bin/python(_PyFunction_Vectorcall+0x6f)[0x559a7cb8f3cf] > miniconda3/envs/dev/bin/python(_PyEval_EvalFrameDefault+0x2b05)[0x559a7cb813c5] > miniconda3/envs/dev/bin/python(_PyFunction_Vectorcall+0x6f)[0x559a7cb8f3cf] > miniconda3/envs/dev/bin/python(_PyEval_EvalFrameDefault+0x66f)[0x559a7cb7ef2f] > miniconda3/envs/dev/bin/python(+0x14fc9d)[0x559a7cb9ac9d] > miniconda3/envs/dev/bin/python(_PyObject_GenericGetAttrWithDict+0x4f3)[0x559a7cb8da03] > miniconda3/envs/dev/bin/python(PyObject_GetAttr+0x44)[0x559a7cb8c494] > miniconda3/envs/dev/bin/python(_PyEval_EvalFrameDefault+0x4d8f)[0x559a7cb8364f] > miniconda3/envs/dev/bin/python(+0x14fc9d)[0x559a7cb9ac9d] > miniconda3/envs/dev/bin/python(+0x1416f5)[0x559a7cb8c6f5] > miniconda3/envs/dev/bin/python(PyObject_GetAttr+0x52)[0x559a7cb8c4a2] > miniconda3/envs/dev/bin/python(_PyEval_EvalFrameDefault+0x4d8f)[0x559a7cb8364f] > miniconda3/envs/dev/bin/python(+0x14fc9d)[0x559a7cb9ac9d] > miniconda3/envs/dev/bin/python(_PyObject_GenericGetAttrWithDict+0x4f3)[0x559a7cb8da03] > miniconda3/envs/dev/bin/python(PyObject_GetAttr+0x44)[0x559a7cb8c494] >
[jira] [Updated] (ARROW-18265) [C++] Allow FieldPath to work with ListElement
[ https://issues.apache.org/jira/browse/ARROW-18265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18265: --- Labels: pull-request-available (was: ) > [C++] Allow FieldPath to work with ListElement > -- > > Key: ARROW-18265 > URL: https://issues.apache.org/jira/browse/ARROW-18265 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Miles Granger >Assignee: Miles Granger >Priority: Major > Labels: pull-request-available > Fix For: 11.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > {{FieldRef::FromDotPath}} can parse a single list element field. ie. > {{{}'path.to.list[0]`{}}}but does not work in practice. Failing with: > _struct_field: cannot subscript field of type list<>_ > Being able to add a slice or multiple list elements is not within the scope > of this issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18282) [C++][Python] Support step slicing in list_slice kernel
[ https://issues.apache.org/jira/browse/ARROW-18282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18282: --- Labels: C++ Python pull-request-available (was: C++ Python) > [C++][Python] Support step slicing in list_slice kernel > --- > > Key: ARROW-18282 > URL: https://issues.apache.org/jira/browse/ARROW-18282 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Python >Reporter: Miles Granger >Assignee: Miles Granger >Priority: Major > Labels: C++, Python, pull-request-available > Fix For: 11.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > [GitHub PR 14395 | https://github.com/apache/arrow/pull/14395] adds the > {{list_slice}} kernel, but does not implement the case where {{step != 1}}, > which should implement step slicing other than 1. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18379) [Python] Change warnings to _warnings in _plasma_store_entry_point
[ https://issues.apache.org/jira/browse/ARROW-18379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18379: --- Labels: pull-request-available (was: ) > [Python] Change warnings to _warnings in _plasma_store_entry_point > -- > > Key: ARROW-18379 > URL: https://issues.apache.org/jira/browse/ARROW-18379 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Alenka Frim >Assignee: Alenka Frim >Priority: Major > Labels: pull-request-available > Fix For: 10.0.2, 11.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > There is a {{leftover in python/pyarrow/__init__.py}} from > [https://github.com/apache/arrow/pull/14343] due to {{warnings}} being > imported as {{_warnings}}. > Connected GitHub issue: [https://github.com/apache/arrow/issues/14693] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18303) [Go] Missing tag for compute module
[ https://issues.apache.org/jira/browse/ARROW-18303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18303: --- Labels: pull-request-available (was: ) > [Go] Missing tag for compute module > --- > > Key: ARROW-18303 > URL: https://issues.apache.org/jira/browse/ARROW-18303 > Project: Apache Arrow > Issue Type: Improvement > Components: Go >Affects Versions: 10.0.0 >Reporter: Lilian Maurel >Assignee: Matthew Topol >Priority: Major > Labels: pull-request-available > Original Estimate: 1h > Time Spent: 10m > Remaining Estimate: 50m > > Since https://issues.apache.org/jira/browse/ARROW-17456 compute is separate > to a separate module. > > import change to github.com/apache/arrow/go/v9/arrow/compute to > github.com/apache/arrow/go/arrow/compute/v10 > > Tag go/arrow/compute/v10.0.0 must be create for go mod resolution > > Also in go.mod > line module github.com/apache/arrow/go/v10/arrow/compute > must be change by module github.com/apache/arrow/go/arrow/compute/v10 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18374) [Go][CI][Benchmarks] Fix Go Bench Script after conbench change
[ https://issues.apache.org/jira/browse/ARROW-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18374: --- Labels: pull-request-available (was: ) > [Go][CI][Benchmarks] Fix Go Bench Script after conbench change > -- > > Key: ARROW-18374 > URL: https://issues.apache.org/jira/browse/ARROW-18374 > Project: Apache Arrow > Issue Type: Bug > Components: Benchmarking, Continuous Integration, Go >Reporter: Matthew Topol >Assignee: Matthew Topol >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Change [https://github.com/conbench/conbench/pull/417/files#] requires now > putting an explicit {{github=None}} as an argument to {{BenchmarkResult}} to > have it get the github info from the locally cloned repo. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18373) MIGRATION: Enable multiple component selection in issue templates
[ https://issues.apache.org/jira/browse/ARROW-18373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18373: --- Labels: pull-request-available (was: ) > MIGRATION: Enable multiple component selection in issue templates > - > > Key: ARROW-18373 > URL: https://issues.apache.org/jira/browse/ARROW-18373 > Project: Apache Arrow > Issue Type: Task >Reporter: Todd Farmer >Assignee: Todd Farmer >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Per comments in [this merged PR|https://github.com/apache/arrow/pull/14675], > we would like to enable selection of multiple components when reporting > issues via GitHub issues. > Additionally, we may want to add the needed Apache license to the issue > templates and remove the exclusion rules from rat_exclude_files.txt. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18363) [Docs] Include warning when viewing old docs (redirecting to stable/dev docs)
[ https://issues.apache.org/jira/browse/ARROW-18363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18363: --- Labels: pull-request-available (was: ) > [Docs] Include warning when viewing old docs (redirecting to stable/dev docs) > - > > Key: ARROW-18363 > URL: https://issues.apache.org/jira/browse/ARROW-18363 > Project: Apache Arrow > Issue Type: Improvement > Components: Documentation >Reporter: Joris Van den Bossche >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Now we have versioned docs, we also have the old versions of the developers > docs (eg > https://arrow.apache.org/docs/9.0/developers/guide/communication.html). Those > might be outdated (eg regarding communication channels, build instructions, > etc), and typically when contributing / developing with the latest arrow, one > should _always_ check the latest dev version of the contributing docs. > We could add a warning box pointing this out and linking to the dev docs. > For example similarly how some projects warn about viewing old docs in > general and point to the stable docs (eg https://mne.tools/1.1/index.html or > https://scikit-learn.org/1.0/user_guide.html). In this case we could have a > custom box when at a page in /developers to point to the dev docs instead of > stable docs -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-17676) [C++] [Python] User-defined tabular functions
[ https://issues.apache.org/jira/browse/ARROW-17676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-17676: --- Labels: pull-request-available (was: ) > [C++] [Python] User-defined tabular functions > - > > Key: ARROW-17676 > URL: https://issues.apache.org/jira/browse/ARROW-17676 > Project: Apache Arrow > Issue Type: New Feature > Components: C++, Python >Reporter: Yaron Gvili >Assignee: Yaron Gvili >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, only a stateless user-defined function is supported in PyArrow. > This issue will add support for a user-defined tabular function, which is a > user-function implemented in Python that returns a stateful stream of tabular > data. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18366) [Packaging][RPM][Gandiva] Failed to link on AlmaLinux 9
[ https://issues.apache.org/jira/browse/ARROW-18366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18366: --- Labels: pull-request-available (was: ) > [Packaging][RPM][Gandiva] Failed to link on AlmaLinux 9 > > > Key: ARROW-18366 > URL: https://issues.apache.org/jira/browse/ARROW-18366 > Project: Apache Arrow > Issue Type: Improvement > Components: Packaging >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > https://github.com/ursacomputing/crossbow/actions/runs/3502784911/jobs/5867407921#step:6:4748 > {noformat} > FAILED: gandiva-glib/Gandiva-1.0.gir > env > PKG_CONFIG_PATH=/usr/lib64/pkgconfig:/usr/share/pkgconfig:/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/meson-uninstalled > /usr/bin/g-ir-scanner --quiet --no-libtool --namespace=Gandiva > --nsversion=1.0 --warn-all --output gandiva-glib/Gandiva-1.0.gir > --c-include=gandiva-glib/gandiva-glib.h --warn-all > --include-uninstalled=./arrow-glib/Arrow-1.0.gir > -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/gandiva-glib > -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/gandiva-glib > -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/. > -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/. > -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/../cpp/redhat-linux-build/src > > -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/../cpp/redhat-linux-build/src > -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/../cpp/src > -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/../cpp/src > -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/. > -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/. > -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/../cpp/redhat-linux-build/src > > -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/../cpp/redhat-linux-build/src > -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/../cpp/src > -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/../cpp/src > --filelist=/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/gandiva-glib/libgandiva-glib.so.1100.0.0.p/Gandiva_1.0_gir_filelist > --include=Arrow-1.0 --symbol-prefix=ggandiva --identifier-prefix=GGandiva > --pkg-export=gandiva-glib --cflags-begin > -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/. > -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/. > -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/../cpp/redhat-linux-build/src > > -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/../cpp/redhat-linux-build/src > -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/../cpp/src > -I/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/../cpp/src > -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include > -I/usr/include/sysprof-4 -I/usr/include/gobject-introspection-1.0 > --cflags-end > --add-include-path=/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/arrow-glib > --add-include-path=/usr/share/gir-1.0 > -L/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/gandiva-glib > --library gandiva-glib > -L/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/arrow-glib > -L/build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/../../cpp/redhat-linux-build/release > --extra-library=gobject-2.0 --extra-library=glib-2.0 > --extra-library=girepository-1.0 --sources-top-dirs > /build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/ --sources-top-dirs > /build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/ --warn-error > /usr/bin/ld: > /build/rpmbuild/BUILD/apache-arrow-11.0.0.dev130/c_glib/build/../../cpp/redhat-linux-build/release/libgandiva.so.1100: > undefined reference to `std::__glibcxx_assert_fail(char const*, int, char > const*, char const*)' > collect2: error: ld returned 1 exit status > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-15470) [R] Allows user to specify string to be used for missing data when writing CSV dataset
[ https://issues.apache.org/jira/browse/ARROW-15470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-15470: --- Labels: pull-request-available (was: ) > [R] Allows user to specify string to be used for missing data when writing > CSV dataset > -- > > Key: ARROW-15470 > URL: https://issues.apache.org/jira/browse/ARROW-15470 > Project: Apache Arrow > Issue Type: Sub-task > Components: R >Reporter: Nicola Crane >Assignee: Will Jones >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The ability to select the string to be used for missing data was implemented > for the CSV Writer in ARROW-14903 and as David Li points out below, is > available, so I think we just need to hook it up on the R side. > This requires the values passed in as the "na" argument to be instead passed > through to "null_strings", similarly to what has been done with "skip" and > "skip_rows" in ARROW-15743. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18345) [R] Create a CRAN-specific packaging checklist that lives in the R package directory
[ https://issues.apache.org/jira/browse/ARROW-18345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18345: --- Labels: pull-request-available (was: ) > [R] Create a CRAN-specific packaging checklist that lives in the R package > directory > - > > Key: ARROW-18345 > URL: https://issues.apache.org/jira/browse/ARROW-18345 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Dewey Dunnington >Assignee: Dewey Dunnington >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Like other packaging tasks, the CRAN packaging task (which is concerned with > making sure the R package from the Arrow release complies with CRAN policies) > for the R package is slightly different than the overall Arrow release task > for the R package. For example, we often push patch-patch releases if the > two-week window we get to "safely retain the package on CRAN" does not line > up with a release vote. [~npr] has heroically been doing this for a long > time, and while he has equally heroically volunteered to keep doing it, I am > hoping to process of codifying this somewhere in the R repo will help a wider > set of contributors understand the process (even if it was already documented > elsewhere!). > [~stephhazlitt] and I use {{usethis::use_release_issue()}} to manage our > personal R package releases, and I'm wondering if creating a similar function > or markdown template would help here. > I'm happy to start the process of putting a PR up for discussion! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18362) [Parquet][C++] Accelerate bit-packing decoding with AVX-512
[ https://issues.apache.org/jira/browse/ARROW-18362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18362: --- Labels: pull-request-available (was: ) > [Parquet][C++] Accelerate bit-packing decoding with AVX-512 > --- > > Key: ARROW-18362 > URL: https://issues.apache.org/jira/browse/ARROW-18362 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Parquet >Reporter: zhaoyaqi >Assignee: zhaoyaqi >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Accelerate Parquet bit-packing decoding with AVX-512 instructions? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18323) MIGRATION TEST ISSUE #2
[ https://issues.apache.org/jira/browse/ARROW-18323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18323: --- Labels: pull-request-available (was: ) > MIGRATION TEST ISSUE #2 > --- > > Key: ARROW-18323 > URL: https://issues.apache.org/jira/browse/ARROW-18323 > Project: Apache Arrow > Issue Type: Task >Reporter: Todd Farmer >Assignee: Todd Farmer >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > This issue was created to help test migration-related process and tooling. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18361) [CI][Conan] Merge upstream changes
[ https://issues.apache.org/jira/browse/ARROW-18361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18361: --- Labels: pull-request-available (was: ) > [CI][Conan] Merge upstream changes > -- > > Key: ARROW-18361 > URL: https://issues.apache.org/jira/browse/ARROW-18361 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Updated: https://github.com/conan-io/conan-center-index/pull/14111 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18110) [Go] Scalar Comparisons
[ https://issues.apache.org/jira/browse/ARROW-18110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18110: --- Labels: pull-request-available (was: ) > [Go] Scalar Comparisons > --- > > Key: ARROW-18110 > URL: https://issues.apache.org/jira/browse/ARROW-18110 > Project: Apache Arrow > Issue Type: Sub-task >Reporter: Matthew Topol >Assignee: Matthew Topol >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18349) [CI][C++][Flight] Exercise UCX on CI
[ https://issues.apache.org/jira/browse/ARROW-18349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18349: --- Labels: pull-request-available (was: ) > [CI][C++][Flight] Exercise UCX on CI > > > Key: ARROW-18349 > URL: https://issues.apache.org/jira/browse/ARROW-18349 > Project: Apache Arrow > Issue Type: Task > Components: C++, Continuous Integration, FlightRPC >Reporter: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 11.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > UCX doesn't seem enabled on any CI configuration for now. > We should have at least a nightly job with UCX enabled, for example one of > the Conda or Ubuntu builds. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18350) [C++] Use std::to_chars instead of std::to_string
[ https://issues.apache.org/jira/browse/ARROW-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-18350: --- Labels: pull-request-available (was: ) > [C++] Use std::to_chars instead of std::to_string > - > > Key: ARROW-18350 > URL: https://issues.apache.org/jira/browse/ARROW-18350 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > {{std::to_chars}} is locale-independent unlike {{std::to_string}}; it may > also be faster in some cases. -- This message was sent by Atlassian Jira (v8.20.10#820010)