[jira] [Created] (ARROW-7526) [C++][Compute]: Optimize small integer sorting
Yibo Cai created ARROW-7526: --- Summary: [C++][Compute]: Optimize small integer sorting Key: ARROW-7526 URL: https://issues.apache.org/jira/browse/ARROW-7526 Project: Apache Arrow Issue Type: Improvement Components: C++ - Compute Reporter: Yibo Cai Assignee: Yibo Cai Current sorting kernel handles all data types with stl stable_sort. It is suboptimal for small integers like Int8, in which case counting sort is more suitable. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7527) [Python] pandas/feather tests failing on pandas master
Joris Van den Bossche created ARROW-7527: Summary: [Python] pandas/feather tests failing on pandas master Key: ARROW-7527 URL: https://issues.apache.org/jira/browse/ARROW-7527 Project: Apache Arrow Issue Type: Test Components: Python Reporter: Joris Van den Bossche Because I merged a PR in pandas to support Period dtype, some tests in pyarrow are now failing (they were using period dtype to test "unsupported" dtypes) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7528) [Python] The pandas.datetime class (import of datetime.datetime) is deprecated
Joris Van den Bossche created ARROW-7528: Summary: [Python] The pandas.datetime class (import of datetime.datetime) is deprecated Key: ARROW-7528 URL: https://issues.apache.org/jira/browse/ARROW-7528 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: Joris Van den Bossche Assignee: Joris Van den Bossche Fix For: 0.16.0 The {{pd.datetime}} was actually just an import from {{datetime.datetime}}, and is being removed from pandas (to use the stdlib one directly). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7529) [C++][Gandiva] Handle utf8 characters for castVARCHAR(string, int) function
Projjal Chanda created ARROW-7529: - Summary: [C++][Gandiva] Handle utf8 characters for castVARCHAR(string, int) function Key: ARROW-7529 URL: https://issues.apache.org/jira/browse/ARROW-7529 Project: Apache Arrow Issue Type: Task Components: C++ - Gandiva Reporter: Projjal Chanda Assignee: Projjal Chanda -- This message was sent by Atlassian Jira (v8.3.4#803005)
Pending Java pull requests
Hi, Roughly 40% of the pending pull requests are tagged as Java [1]. Some of those having long threads and some of them are not reviewed yet. Considering the upcoming release it would be great to close or proceed with them. So any additional help from Java developers would be appreciated! Thanks, Krisztian [1]: https://github.com/apache/arrow/pulls?q=is%3Apr+is%3Aopen+java
[jira] [Created] (ARROW-7530) [Developer] Do not include list of commits from PR in squashed summary message
Wes McKinney created ARROW-7530: --- Summary: [Developer] Do not include list of commits from PR in squashed summary message Key: ARROW-7530 URL: https://issues.apache.org/jira/browse/ARROW-7530 Project: Apache Arrow Issue Type: Improvement Components: Developer Tools Reporter: Wes McKinney Fix For: 1.0.0 We might assess whether these messages add useful information to the project's commit history. Other projects like Apache Spark have stopped preserving this information. This came up in https://github.com/apache/arrow/pull/6136 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[NIGHTLY] Arrow Build Report for Job nightly-2020-01-09-0
Arrow Build Report for Job nightly-2020-01-09-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0 Failed Tasks: - gandiva-jar-osx: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-travis-gandiva-jar-osx - homebrew-cpp: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-travis-homebrew-cpp - macos-r-autobrew: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-travis-macos-r-autobrew - test-conda-python-3.7-pandas-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-circle-test-conda-python-3.7-pandas-master - wheel-manylinux2010-cp38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-azure-wheel-manylinux2010-cp38 Succeeded Tasks: - centos-6: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-azure-centos-6 - centos-7: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-azure-centos-7 - centos-8: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-azure-centos-8 - conda-linux-gcc-py27: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-azure-conda-linux-gcc-py27 - conda-linux-gcc-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-azure-conda-linux-gcc-py36 - conda-linux-gcc-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-azure-conda-linux-gcc-py37 - conda-linux-gcc-py38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-azure-conda-linux-gcc-py38 - conda-osx-clang-py27: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-azure-conda-osx-clang-py27 - conda-osx-clang-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-azure-conda-osx-clang-py36 - conda-osx-clang-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-azure-conda-osx-clang-py37 - conda-osx-clang-py38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-azure-conda-osx-clang-py38 - conda-win-vs2015-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-azure-conda-win-vs2015-py36 - conda-win-vs2015-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-azure-conda-win-vs2015-py37 - conda-win-vs2015-py38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-azure-conda-win-vs2015-py38 - debian-buster: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-azure-debian-buster - debian-stretch: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-azure-debian-stretch - gandiva-jar-trusty: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-travis-gandiva-jar-trusty - test-conda-cpp: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-circle-test-conda-cpp - test-conda-python-2.7-pandas-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-circle-test-conda-python-2.7-pandas-latest - test-conda-python-2.7: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-circle-test-conda-python-2.7 - test-conda-python-3.6: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-circle-test-conda-python-3.6 - test-conda-python-3.7-dask-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-circle-test-conda-python-3.7-dask-latest - test-conda-python-3.7-hdfs-2.9.2: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-circle-test-conda-python-3.7-hdfs-2.9.2 - test-conda-python-3.7-pandas-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-circle-test-conda-python-3.7-pandas-latest - test-conda-python-3.7-spark-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-circle-test-conda-python-3.7-spark-master - test-conda-python-3.7-turbodbc-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-circle-test-conda-python-3.7-turbodbc-latest - test-conda-python-3.7-turbodbc-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-circle-test-conda-python-3.7-turbodbc-master - test-conda-python-3.7: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-circle-test-conda-python-3.7 - test-conda-python-3.8-dask-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-circle-test-conda-python-3.8-dask-master - test-conda-pyt
Re: Human-readable version of Arrow Schema?
The desired goal for this feature is trivial modifications, e.g. within an editor, by data-scientists and researchers. I'd go for the flatbuffer's json representation as it is stable and has native support in almost any language or editor due to the ubiquity of JSON. The C interface schema string representation is optimized for developers writing parser/codecs and looks like gibberish to anyone not familiar with python's struct format string. François On Wed, Jan 8, 2020 at 8:50 PM Kohei KaiGai wrote: > > Hello, > > pg2arrow [*1] has '--dump' mode to print out schema definition of the > given Apache Arrow file. > Does it make sense for you? > > $ ./pg2arrow --dump ~/hoge.arrow > [Footer] > {Footer: version=V4, schema={Schema: endianness=little, > fields=[{Field: name="id", nullable=true, type={Int32}, children=[], > custom_metadata=[]}, {Field: name="a", nullable=true, type={Float64}, > children=[], custom_metadata=[]}, {Field: name="b", nullable=true, > type={Decimal: precision=11, scale=7}, children=[], > custom_metadata=[]}, {Field: name="c", nullable=true, type={Struct}, > children=[{Field: name="x", nullable=true, type={Int32}, children=[], > custom_metadata=[]}, {Field: name="y", nullable=true, type={Float32}, > children=[], custom_metadata=[]}, {Field: name="z", nullable=true, > type={Utf8}, children=[], custom_metadata=[]}], custom_metadata=[]}, > {Field: name="d", nullable=true, type={Utf8}, > dictionary={DictionaryEncoding: id=0, indexType={Int32}, > isOrdered=false}, children=[], custom_metadata=[]}, {Field: name="e", > nullable=true, type={Timestamp: unit=us}, children=[], > custom_metadata=[]}, {Field: name="f", nullable=true, type={Utf8}, > children=[], custom_metadata=[]}, {Field: name="random", > nullable=true, type={Float64}, children=[], custom_metadata=[]}], > custom_metadata=[{KeyValue: key="sql_command" value="SELECT *,random() > FROM t"}]}, dictionaries=[{Block: offset=920, metaDataLength=184 > bodyLength=128}], recordBatches=[{Block: offset=1232, > metaDataLength=648 bodyLength=386112}]} > [Dictionary Batch 0] > {Block: offset=920, metaDataLength=184 bodyLength=128} > {Message: version=V4, body={DictionaryBatch: id=0, data={RecordBatch: > length=6, nodes=[{FieldNode: length=6, null_count=0}], > buffers=[{Buffer: offset=0, length=0}, {Buffer: offset=0, length=64}, > {Buffer: offset=64, length=64}]}, isDelta=false}, bodyLength=128} > [Record Batch 0] > {Block: offset=1232, metaDataLength=648 bodyLength=386112} > {Message: version=V4, body={RecordBatch: length=3000, > nodes=[{FieldNode: length=3000, null_count=0}, {FieldNode: > length=3000, null_count=60}, {FieldNode: length=3000, null_count=62}, > {FieldNode: length=3000, null_count=0}, {FieldNode: length=3000, > null_count=56}, {FieldNode: length=3000, null_count=66}, {FieldNode: > length=3000, null_count=0}, {FieldNode: length=3000, null_count=0}, > {FieldNode: length=3000, null_count=64}, {FieldNode: length=3000, > null_count=0}, {FieldNode: length=3000, null_count=0}], > buffers=[{Buffer: offset=0, length=0}, {Buffer: offset=0, > length=12032}, {Buffer: offset=12032, length=384}, {Buffer: > offset=12416, length=24000}, {Buffer: offset=36416, length=384}, > {Buffer: offset=36800, length=48000}, {Buffer: offset=84800, > length=0}, {Buffer: offset=84800, length=384}, {Buffer: offset=85184, > length=12032}, {Buffer: offset=97216, length=384}, {Buffer: > offset=97600, length=12032}, {Buffer: offset=109632, length=0}, > {Buffer: offset=109632, length=12032}, {Buffer: offset=121664, > length=96000}, {Buffer: offset=217664, length=0}, {Buffer: > offset=217664, length=12032}, {Buffer: offset=229696, length=384}, > {Buffer: offset=230080, length=24000}, {Buffer: offset=254080, > length=0}, {Buffer: offset=254080, length=12032}, {Buffer: > offset=266112, length=96000}, {Buffer: offset=362112, length=0}, > {Buffer: offset=362112, length=24000}]}, bodyLength=386112} > > [*1] https://heterodb.github.io/pg-strom/arrow_fdw/#using-pg2arrow > > 2019年12月7日(土) 6:26 Christian Hudon : > > > > Hi, > > > > For the uses I would like to make of Arrow, I would need a human-readable > > and -writable version of an Arrow Schema, that could be converted to and > > from the Arrow Schema C++ object. Going through the doc for 0.15.1, I don't > > see anything to that effect, with the closest being the ToString() method > > on DataType instances, but which is meant for debugging only. (I need an > > expression of an Arrow Schema that people can read, and that can live > > outside of the code for a particular operation.) > > > > Is a text representation of an Arrow Schema something that is being worked > > on now? If not, would you folks be interested in me putting up an initial > > proposal for discussion? Any design constraints I should pay attention to, > > then? > > > > Thanks, > > > > Christian > > -- > > > > > > │ Christian Hudon > > > > │ Applied Research Scientist > > > >Element AI, 6650 Saint-Urbain #500 > > > >Montréal, QC, H2S 3G9, Canada >
Coordinating / scheduling C++ Parquet-Arrow nested data work (ARROW-1644 and others)
hi folks, I think we have reached a point where the incomplete C++ Parquet nested data assembly/disassembly is harming the value of several others parts of the project, for example the Datasets API. As another example, it's possible to ingest nested data from JSON but not write it to Parquet in general. Implementing the nested data read and write path completely is a difficult project requiring at least several weeks of dedicated work, so it's not so surprising that it hasn't been accomplished yet. I know that several people have expressed interest in working on it, but I would like to see if anyone would be able to volunteer a commitment of time and guess on a rough timeline when this work could be done. It seems to me if this slips beyond 2020 it will significant diminish the value being created by other parts of the project. Since I'm pretty familiar with all the Parquet code I'm one candidate person to take on this project (and I can dedicate the time, but it would come at the expense of other projects where I can also be useful). But Micah and others expressed interest in working on it, so I wanted to have a discussion about it to see what others think. Thanks Wes
[jira] [Created] (ARROW-7531) [C++] Investigate header cost reduction
Antoine Pitrou created ARROW-7531: - Summary: [C++] Investigate header cost reduction Key: ARROW-7531 URL: https://issues.apache.org/jira/browse/ARROW-7531 Project: Apache Arrow Issue Type: Task Components: C++ Reporter: Antoine Pitrou Using https://github.com/aras-p/ClangBuildAnalyzer we could create to find out the worst offenders in terms of header file parsing cost when compiling. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Pending Java pull requests
I think there are a decent chunk that are of questionable value. We need to be more willing to simply reject requests rather than leave them in no-man's land. I'll try to do a pass through and help dispatch, etc. On Thu, Jan 9, 2020 at 5:25 AM Krisztián Szűcs wrote: > Hi, > > Roughly 40% of the pending pull requests are tagged as Java [1]. > Some of those having long threads and some of them are not > reviewed yet. Considering the upcoming release it would be great > to close or proceed with them. > So any additional help from Java developers would be appreciated! > > Thanks, Krisztian > > [1]: https://github.com/apache/arrow/pulls?q=is%3Apr+is%3Aopen+java >
Re: [DRAFT] Apache Arrow Board Report January 2020
Posted with correction. Thanks to Wes, Antoine and Todd! On Wed, Jan 8, 2020 at 10:15 AM Wes McKinney wrote: > Not sure what happened there. The two words after "grow" can be removed > > ## Description: > > The mission of Apache Arrow is the creation and maintenance of software > related > to columnar in-memory processing and data interchange > > ## Issues: > > There are no issues requiring board attention at this time. > > ## Membership Data: > Apache Arrow was founded 2016-01-19 (4 years ago) > There are currently 50 committers and 28 PMC members in this project. > The Committer-to-PMC ratio is roughly 7:4. > > Community changes, past quarter: > - No new PMC members. Last addition was Micah Kornfield on 2019-08-21. > - Eric Erhardt was added as committer on 2019-10-18 > - Joris Van den Bossche was added as committer on 2019-12-06 > > ## Project Activity: > > * We have completed our initial migration away from Travis CI for > continuous integration and patch validation to use the new > GitHub Actions (GHA) service. We are much happier with the > compute resource allocation provided by GitHub but longer term > we are concerned that the generous free allocation may not > continue and would be interested to know what kinds of > guarantees (if any) GitHub may make to the ASF regarding GHA. > * We are not out of the woods on CI/CD as there are features of Apache > Arrow > that we cannot test in GitHub Actions. We are still considering options > for > running these optional test workloads as well as other kinds of periodic > workloads like benchmarking > * We hope to make a 1.0.0 release of the project in early 2020. We had > thought > that our next major release after 0.15.0 would be 1.0.0 but we have not > yet > completed some necessary work items that the community has agreed are > essential to graduate to 1.0.0 > > Recent releases: > 0.15.0 was released on 2019-10-05. > 0.14.1 was released on 2019-07-21. > 0.14.0 was released on 2019-07-04. > > ## Community Health: > > The developer community is healthy and continues to grow. > > On Wed, Jan 8, 2020 at 12:12 PM Todd Hendricks > wrote: > > > > Hi Wes, > > > > Looks like there is a cutoff sentence at the end of the Community Health > > section. > > > > On Wed, Jan 8, 2020 at 10:01 AM Wes McKinney > wrote: > > > > > Here is an updated draft. If there is no more feedback, this can be > > > submitted to the board > > > > > > ## Description: > > > > > > The mission of Apache Arrow is the creation and maintenance of software > > > related > > > to columnar in-memory processing and data interchange > > > > > > ## Issues: > > > > > > There are no issues requiring board attention at this time. > > > > > > ## Membership Data: > > > Apache Arrow was founded 2016-01-19 (4 years ago) > > > There are currently 50 committers and 28 PMC members in this project. > > > The Committer-to-PMC ratio is roughly 7:4. > > > > > > Community changes, past quarter: > > > - No new PMC members. Last addition was Micah Kornfield on 2019-08-21. > > > - Eric Erhardt was added as committer on 2019-10-18 > > > - Joris Van den Bossche was added as committer on 2019-12-06 > > > > > > ## Project Activity: > > > > > > * We have completed our initial migration away from Travis CI for > > > continuous integration and patch validation to use the new > > > GitHub Actions (GHA) service. We are much happier with the > > > compute resource allocation provided by GitHub but longer term > > > we are concerned that the generous free allocation may not > > > continue and would be interested to know what kinds of > > > guarantees (if any) GitHub may make to the ASF regarding GHA. > > > * We are not out of the woods on CI/CD as there are features of Apache > > > Arrow > > > that we cannot test in GitHub Actions. We are still considering > options > > > for > > > running these optional test workloads as well as other kinds of > periodic > > > workloads like benchmarking > > > * We hope to make a 1.0.0 release of the project in early 2020. We had > > > thought > > > that our next major release after 0.15.0 would be 1.0.0 but we have > not > > > yet > > > completed some necessary work items that the community has agreed are > > > essential to graduate to 1.0.0 > > > > > > Recent releases: > > > 0.15.0 was released on 2019-10-05. > > > 0.14.1 was released on 2019-07-21. > > > 0.14.0 was released on 2019-07-04. > > > > > > ## Community Health: > > > > > > The developer community is healthy and continues to grow.THe co > > > > > > On Mon, Jan 6, 2020 at 11:16 AM Antoine Pitrou > wrote: > > > > > > > > > > > > Perhaps also mention that we're dependent on enough capacity on > GitHub > > > > Actions currently. I'm not sure how long their generosity will last > :-) > > > > > > > > > > > > Le 06/01/2020 à 18:14, Wes McKinney a écrit : > > > > > There is still the question of how to manage CI tasks (e.g. > > > > > GPU-enabled, ARM-enabled) that are unable to be run i
Re: Pending Java pull requests
My time has been more limited lately, but i'll try to work through these some as well over the next couple of days. On Thu, Jan 9, 2020 at 8:44 AM Jacques Nadeau wrote: > I think there are a decent chunk that are of questionable value. We need to > be more willing to simply reject requests rather than leave them in > no-man's land. I'll try to do a pass through and help dispatch, etc. > > On Thu, Jan 9, 2020 at 5:25 AM Krisztián Szűcs > wrote: > > > Hi, > > > > Roughly 40% of the pending pull requests are tagged as Java [1]. > > Some of those having long threads and some of them are not > > reviewed yet. Considering the upcoming release it would be great > > to close or proceed with them. > > So any additional help from Java developers would be appreciated! > > > > Thanks, Krisztian > > > > [1]: https://github.com/apache/arrow/pulls?q=is%3Apr+is%3Aopen+java > > >
[jira] [Created] (ARROW-7532) [CI] Unskip brew test after Homebrew fixes it upstream
Neal Richardson created ARROW-7532: -- Summary: [CI] Unskip brew test after Homebrew fixes it upstream Key: ARROW-7532 URL: https://issues.apache.org/jira/browse/ARROW-7532 Project: Apache Arrow Issue Type: Improvement Components: Continuous Integration Reporter: Neal Richardson Assignee: Neal Richardson Followup to ARROW-7492. See https://github.com/Homebrew/brew/issues/6908. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7533) [Java] Move ArrowBufPointer out of the java the memory package
Jacques Nadeau created ARROW-7533: - Summary: [Java] Move ArrowBufPointer out of the java the memory package Key: ARROW-7533 URL: https://issues.apache.org/jira/browse/ARROW-7533 Project: Apache Arrow Issue Type: Task Components: Java Reporter: Jacques Nadeau Assignee: Liya Fan The memory package is focused on memory access and management. ArrowBufPointer should be moved to algorithm package as it isn't core to the Arrow memory management primitives. I would further suggest that is an anti-pattern. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7534) Create a new java/contrib module
Jacques Nadeau created ARROW-7534: - Summary: Create a new java/contrib module Key: ARROW-7534 URL: https://issues.apache.org/jira/browse/ARROW-7534 Project: Apache Arrow Issue Type: Task Reporter: Jacques Nadeau Assignee: Liya Fan To better clarify the status of java sub-modules, create a contrib module and move the following modules underneath it. * algorithm * adapter * plasma -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Timeline for next major release [was Re: Looking to 1.0]
It would be helpful that when something is assigned to a release and you want to push it out, you push it to the next release as opposed to removing a fix version entirely. Thanks! On Tue, Jan 7, 2020 at 10:26 AM Wes McKinney wrote: > I just renamed the 1.0.0 release version in JIRA to 0.16.0 and will > work on removing issues that are not necessary to be able to release > (others, please help). If we make miraculous progress with the 1.0.0 > columnar format blockers (per discussion below), we can change this > back, but I think either way we should put ourselves on a critical > path to have an RC cut by Friday January 24. Does that seem doable? > > On Tue, Jan 7, 2020 at 10:25 AM Wes McKinney wrote: > > > > We absolutely should have a list of exactly what needs to be done to > > put out the 1.0.0 release, but based on what we know needs to be done > > I am not optimistic that it can all be accomplished before the end of > > January. That doesn't mean that we should assume these things won't > > get done before March/April time frame. If they get done sooner, let's > > release 1.0.0 sooner. > > > > On Mon, Jan 6, 2020 at 6:03 PM Neal Richardson > > wrote: > > > > > > I'm all for maintaining a regular cadence of releases, but before we > cast > > > aside the idea of 1.0, I'd still encourage us to do the work of > enumerating > > > what truly must happen before we call a release 1.0 so that we can get > it > > > done. Otherwise, in April we're going to be talking about doing a 0.17 > > > release. > > > > > > I believe I've found the issues that Wes referenced and added them as > > > "blockers" to 1.0.0. That brings the total blocker count listed on > > > https://cwiki.apache.org/confluence/display/ARROW/Arrow+1.0.0+Release > to 10 > > > issues, though some may be overlapping/redundant. Do we think this is > an > > > exhaustive list of blockers? Should some of these be downgraded to > > > not-blocking? If we were to resolve all 10 of these issues, would we > have > > > consensus that we're ready for 1.0? > > > > > > Would it help to update this wiki, which seems pretty stale at this > point? > > > > https://cwiki.apache.org/confluence/display/ARROW/Columnar+Format+1.0+Milestone > > > > > > Thanks, > > > Neal > > > > > > > > > On Mon, Jan 6, 2020 at 11:40 AM Bryan Cutler > wrote: > > > > > > > I agree on a 0.16.0 release. In the meantime I'll try to help out > with > > > > getting the Java side ready for 1.0. > > > > > > > > On Sat, Jan 4, 2020 at 7:21 PM Fan Liya > wrote: > > > > > > > > > Hi Jacques, > > > > > > > > > > ARROW-4526 is interesting. I would like to try to resolve it. > > > > > Thanks a lot for the information. > > > > > > > > > > Best, > > > > > Liya Fan > > > > > > > > > > > > > > > On Sun, Jan 5, 2020 at 6:14 AM Jacques Nadeau > > > > wrote: > > > > > > > > > > > The third ticket I was commenting on was ARROW-4526. > > > > > > > > > > > > Fan, do you want to take a shot at that one? > > > > > > > > > > > > On Fri, Jan 3, 2020 at 8:16 PM Fan Liya > wrote: > > > > > > > > > > > > > Hi Jacques, > > > > > > > > > > > > > > I am interested in the issues, and if it is possible, I would > like to > > > > > try > > > > > > > to resolve them. > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > Liya Fan > > > > > > > > > > > > > > On Sat, Jan 4, 2020 at 7:16 AM Jacques Nadeau < > jacq...@apache.org> > > > > > > wrote: > > > > > > > > > > > > > > > I identified three things in the java library that I think > are top > > > > of > > > > > > > mind > > > > > > > > and should be fixed before 1.0 to avoid weird incompatibility > > > > changes > > > > > > in > > > > > > > > the java apis (technical debt). I've tagged them as pre-1.0 > as I > > > > > don't > > > > > > > > exactly see what is the right way to tag/label a target > release > > > > for a > > > > > > > > ticket. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://issues.apache.org/jira/browse/ARROW-7495?jql=labels%20%3D%20pre-1.0 > > > > > > > > > > > > > > > > For the three tickets I identified, does anyone have > interest in > > > > > trying > > > > > > > to > > > > > > > > resolve? > > > > > > > > > > > > > > > > thanks, > > > > > > > > Jacques > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Jan 2, 2020 at 11:55 AM Neal Richardson < > > > > > > > > neal.p.richard...@gmail.com> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > Happy new year! As we look ahead to 2020, it's time to > start > > > > > > mobilizing > > > > > > > > for > > > > > > > > > the Arrow 1.0 release. At 0.15, I believe we decided that > our > > > > next > > > > > > > > release > > > > > > > > > should be 1.0, and it's been a couple of months since > 0.15, so > > > > > we're > > > > > > > due > > > > > > > > to > > > > > > > > > release again this month, give or take. (See [1] for when > we most > > > > > > > > recently > > > > > > > > > discussed doing 1.0 back in June, or if you
[jira] [Created] (ARROW-7535) [C++] ASAN failure in validation
Neal Richardson created ARROW-7535: -- Summary: [C++] ASAN failure in validation Key: ARROW-7535 URL: https://issues.apache.org/jira/browse/ARROW-7535 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Neal Richardson Assignee: Antoine Pitrou Fix For: 0.16.0 See https://github.com/apache/arrow/runs/376565647#step:5:2035. This is a cron GHA job, so it doesn't show up in our nightly crossbow email. It looks like it's been failing since ARROW-7435 merged. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Timeline for next major release [was Re: Looking to 1.0]
Will do -- there were many C++ and Python-related issues that I think were put in 1.0.0 / 0.16.0 overly optimistically and so I removed the Fix Version entirely (some of these had been pushed off 3-4 major releases ago). I may have removed some Fix Versions from other components that should have been rolled over -- sorry about that. It's hard to judge on some issues that have been open for 6-12 months or more. In general I think we should try to be more conservative about what issues we pre-emptively assign fix versions -- there may be a more constructive way that we can prioritize issues and distinguish between "optimistic" / nice-to-have issues and "must do to release" issues. On Thu, Jan 9, 2020 at 12:42 PM Jacques Nadeau wrote: > > It would be helpful that when something is assigned to a release and you > want to push it out, you push it to the next release as opposed to removing > a fix version entirely. Thanks! > > On Tue, Jan 7, 2020 at 10:26 AM Wes McKinney wrote: > > > I just renamed the 1.0.0 release version in JIRA to 0.16.0 and will > > work on removing issues that are not necessary to be able to release > > (others, please help). If we make miraculous progress with the 1.0.0 > > columnar format blockers (per discussion below), we can change this > > back, but I think either way we should put ourselves on a critical > > path to have an RC cut by Friday January 24. Does that seem doable? > > > > On Tue, Jan 7, 2020 at 10:25 AM Wes McKinney wrote: > > > > > > We absolutely should have a list of exactly what needs to be done to > > > put out the 1.0.0 release, but based on what we know needs to be done > > > I am not optimistic that it can all be accomplished before the end of > > > January. That doesn't mean that we should assume these things won't > > > get done before March/April time frame. If they get done sooner, let's > > > release 1.0.0 sooner. > > > > > > On Mon, Jan 6, 2020 at 6:03 PM Neal Richardson > > > wrote: > > > > > > > > I'm all for maintaining a regular cadence of releases, but before we > > cast > > > > aside the idea of 1.0, I'd still encourage us to do the work of > > enumerating > > > > what truly must happen before we call a release 1.0 so that we can get > > it > > > > done. Otherwise, in April we're going to be talking about doing a 0.17 > > > > release. > > > > > > > > I believe I've found the issues that Wes referenced and added them as > > > > "blockers" to 1.0.0. That brings the total blocker count listed on > > > > https://cwiki.apache.org/confluence/display/ARROW/Arrow+1.0.0+Release > > to 10 > > > > issues, though some may be overlapping/redundant. Do we think this is > > an > > > > exhaustive list of blockers? Should some of these be downgraded to > > > > not-blocking? If we were to resolve all 10 of these issues, would we > > have > > > > consensus that we're ready for 1.0? > > > > > > > > Would it help to update this wiki, which seems pretty stale at this > > point? > > > > > > https://cwiki.apache.org/confluence/display/ARROW/Columnar+Format+1.0+Milestone > > > > > > > > Thanks, > > > > Neal > > > > > > > > > > > > On Mon, Jan 6, 2020 at 11:40 AM Bryan Cutler > > wrote: > > > > > > > > > I agree on a 0.16.0 release. In the meantime I'll try to help out > > with > > > > > getting the Java side ready for 1.0. > > > > > > > > > > On Sat, Jan 4, 2020 at 7:21 PM Fan Liya > > wrote: > > > > > > > > > > > Hi Jacques, > > > > > > > > > > > > ARROW-4526 is interesting. I would like to try to resolve it. > > > > > > Thanks a lot for the information. > > > > > > > > > > > > Best, > > > > > > Liya Fan > > > > > > > > > > > > > > > > > > On Sun, Jan 5, 2020 at 6:14 AM Jacques Nadeau > > > > > wrote: > > > > > > > > > > > > > The third ticket I was commenting on was ARROW-4526. > > > > > > > > > > > > > > Fan, do you want to take a shot at that one? > > > > > > > > > > > > > > On Fri, Jan 3, 2020 at 8:16 PM Fan Liya > > wrote: > > > > > > > > > > > > > > > Hi Jacques, > > > > > > > > > > > > > > > > I am interested in the issues, and if it is possible, I would > > like to > > > > > > try > > > > > > > > to resolve them. > > > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > Liya Fan > > > > > > > > > > > > > > > > On Sat, Jan 4, 2020 at 7:16 AM Jacques Nadeau < > > jacq...@apache.org> > > > > > > > wrote: > > > > > > > > > > > > > > > > > I identified three things in the java library that I think > > are top > > > > > of > > > > > > > > mind > > > > > > > > > and should be fixed before 1.0 to avoid weird incompatibility > > > > > changes > > > > > > > in > > > > > > > > > the java apis (technical debt). I've tagged them as pre-1.0 > > as I > > > > > > don't > > > > > > > > > exactly see what is the right way to tag/label a target > > release > > > > > for a > > > > > > > > > ticket. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://issues.apache.org/jira/browse/ARROW-7495?jql=labels%20%3D%20pre-1.0 > > > > > > > >
[jira] [Created] (ARROW-7536) [Java] [Dev] `docker-compose pull debian-java` fails
Antoine Pitrou created ARROW-7536: - Summary: [Java] [Dev] `docker-compose pull debian-java` fails Key: ARROW-7536 URL: https://issues.apache.org/jira/browse/ARROW-7536 Project: Apache Arrow Issue Type: Bug Components: Developer Tools, Java Reporter: Antoine Pitrou Assignee: Krisztian Szucs I get the following error here: {code} $ docker-compose pull debian-java Pulling debian-java ... error ERROR: for debian-java manifest for apache/arrow-dev:amd64-debian-9-java-8-maven-3.5.4 not found ERROR: manifest for apache/arrow-dev:amd64-debian-9-java-8-maven-3.5.4 not found {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7537) [CI][R] Nightly macOS autobrew job should be more verbose if it fails
Neal Richardson created ARROW-7537: -- Summary: [CI][R] Nightly macOS autobrew job should be more verbose if it fails Key: ARROW-7537 URL: https://issues.apache.org/jira/browse/ARROW-7537 Project: Apache Arrow Issue Type: Improvement Components: Continuous Integration Reporter: Neal Richardson Assignee: Neal Richardson Fix For: 0.16.0 Things like https://travis-ci.org/ursa-labs/crossbow/builds/634643469#L673-L676 are hard to debug because the installation log is not printed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Timeline for next major release [was Re: Looking to 1.0]
Understood and appreciated. Yeah, it can become a bit of a mess. On Thu, Jan 9, 2020 at 12:22 PM Wes McKinney wrote: > Will do -- there were many C++ and Python-related issues that I think > were put in 1.0.0 / 0.16.0 overly optimistically and so I removed the > Fix Version entirely (some of these had been pushed off 3-4 major > releases ago). I may have removed some Fix Versions from other > components that should have been rolled over -- sorry about that. It's > hard to judge on some issues that have been open for 6-12 months or > more. > > In general I think we should try to be more conservative about what > issues we pre-emptively assign fix versions -- there may be a more > constructive way that we can prioritize issues and distinguish between > "optimistic" / nice-to-have issues and "must do to release" issues. > > On Thu, Jan 9, 2020 at 12:42 PM Jacques Nadeau wrote: > > > > It would be helpful that when something is assigned to a release and you > > want to push it out, you push it to the next release as opposed to > removing > > a fix version entirely. Thanks! > > > > On Tue, Jan 7, 2020 at 10:26 AM Wes McKinney > wrote: > > > > > I just renamed the 1.0.0 release version in JIRA to 0.16.0 and will > > > work on removing issues that are not necessary to be able to release > > > (others, please help). If we make miraculous progress with the 1.0.0 > > > columnar format blockers (per discussion below), we can change this > > > back, but I think either way we should put ourselves on a critical > > > path to have an RC cut by Friday January 24. Does that seem doable? > > > > > > On Tue, Jan 7, 2020 at 10:25 AM Wes McKinney > wrote: > > > > > > > > We absolutely should have a list of exactly what needs to be done to > > > > put out the 1.0.0 release, but based on what we know needs to be done > > > > I am not optimistic that it can all be accomplished before the end of > > > > January. That doesn't mean that we should assume these things won't > > > > get done before March/April time frame. If they get done sooner, > let's > > > > release 1.0.0 sooner. > > > > > > > > On Mon, Jan 6, 2020 at 6:03 PM Neal Richardson > > > > wrote: > > > > > > > > > > I'm all for maintaining a regular cadence of releases, but before > we > > > cast > > > > > aside the idea of 1.0, I'd still encourage us to do the work of > > > enumerating > > > > > what truly must happen before we call a release 1.0 so that we can > get > > > it > > > > > done. Otherwise, in April we're going to be talking about doing a > 0.17 > > > > > release. > > > > > > > > > > I believe I've found the issues that Wes referenced and added them > as > > > > > "blockers" to 1.0.0. That brings the total blocker count listed on > > > > > > https://cwiki.apache.org/confluence/display/ARROW/Arrow+1.0.0+Release > > > to 10 > > > > > issues, though some may be overlapping/redundant. Do we think this > is > > > an > > > > > exhaustive list of blockers? Should some of these be downgraded to > > > > > not-blocking? If we were to resolve all 10 of these issues, would > we > > > have > > > > > consensus that we're ready for 1.0? > > > > > > > > > > Would it help to update this wiki, which seems pretty stale at this > > > point? > > > > > > > > > https://cwiki.apache.org/confluence/display/ARROW/Columnar+Format+1.0+Milestone > > > > > > > > > > Thanks, > > > > > Neal > > > > > > > > > > > > > > > On Mon, Jan 6, 2020 at 11:40 AM Bryan Cutler > > > wrote: > > > > > > > > > > > I agree on a 0.16.0 release. In the meantime I'll try to help out > > > with > > > > > > getting the Java side ready for 1.0. > > > > > > > > > > > > On Sat, Jan 4, 2020 at 7:21 PM Fan Liya > > > wrote: > > > > > > > > > > > > > Hi Jacques, > > > > > > > > > > > > > > ARROW-4526 is interesting. I would like to try to resolve it. > > > > > > > Thanks a lot for the information. > > > > > > > > > > > > > > Best, > > > > > > > Liya Fan > > > > > > > > > > > > > > > > > > > > > On Sun, Jan 5, 2020 at 6:14 AM Jacques Nadeau < > jacq...@apache.org> > > > > > > wrote: > > > > > > > > > > > > > > > The third ticket I was commenting on was ARROW-4526. > > > > > > > > > > > > > > > > Fan, do you want to take a shot at that one? > > > > > > > > > > > > > > > > On Fri, Jan 3, 2020 at 8:16 PM Fan Liya < > liya.fa...@gmail.com> > > > wrote: > > > > > > > > > > > > > > > > > Hi Jacques, > > > > > > > > > > > > > > > > > > I am interested in the issues, and if it is possible, I > would > > > like to > > > > > > > try > > > > > > > > > to resolve them. > > > > > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > Liya Fan > > > > > > > > > > > > > > > > > > On Sat, Jan 4, 2020 at 7:16 AM Jacques Nadeau < > > > jacq...@apache.org> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > I identified three things in the java library that I > think > > > are top > > > > > > of > > > > > > > > > mind > > > > > > > > > > and should be fixed before 1.0 to avoid weird > incompatibility >
[Discuss][Rust] Policy regarding "unsafe"
Hi All, This time last year there was a brief discussion on the usage of unsafe in Rust (a user on github raised the issue and I created the JIRA). [1] So far we mostly avoid unsafe in the public API's. The thinking here is that Arrow is a "development platform", i.e. lower level that most libraries, and library builders will want to avoid any performance hit of bounds checking, etc. This is not typical in the Rust community where unsafe is a clear signal that care is needed. Although it might clutter the API a little more I would be in favor of having safe and unsafe variants of methods as needed. For instance, "value" for array access would be changed to "value" and "value_unchecked" where the latter is unsafe and does not perform bounds checks. We don't have a huge number of libraries building on top of Arrow in Rust at the moment so it seems like a good time, before 1.0, to decide on this to avoid breaking changes to the public API in post 1.0. Thoughts? Paddy [1] https://issues.apache.org/jira/browse/ARROW-3776?filter=12343557
[jira] [Created] (ARROW-7538) Clarify actual and desired size in AllocationManager
David Li created ARROW-7538: --- Summary: Clarify actual and desired size in AllocationManager Key: ARROW-7538 URL: https://issues.apache.org/jira/browse/ARROW-7538 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: David Li As a follow up to the review of ARROW-7329, we should clarify the different sizes (desired vs actual size) in AllocationManager: https://github.com/apache/arrow/pull/5973#discussion_r354729754 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7539) [Java] FieldVector getFieldBuffers API should not set reader/writer indices
Ji Liu created ARROW-7539: - Summary: [Java] FieldVector getFieldBuffers API should not set reader/writer indices Key: ARROW-7539 URL: https://issues.apache.org/jira/browse/ARROW-7539 Project: Apache Arrow Issue Type: Bug Components: Java Reporter: Ji Liu Assignee: Ji Liu Per discussion [https://github.com/apache/arrow/pull/6133#discussion_r364906302]. The fact that we have reader/writer settings in {{getFieldBuffers}} is wrong. To clarify, {{getFieldBuffers}} is distinct from {{getBuffers}}. The former should be for getting access to underlying data for higher-performance algorithms. The latter is for sending the data over the wire. Seems we've mixed up use of both. Currently in {{VectorUnloader}}, we used {{getFieldBuffers}} to create {{ArrowRecordBatch}} that’s why we keep writer/reader indices in {{getFieldBuffers}}, we should use {{getBuffers}} instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7540) [C++] License files aren't installed
Kouhei Sutou created ARROW-7540: --- Summary: [C++] License files aren't installed Key: ARROW-7540 URL: https://issues.apache.org/jira/browse/ARROW-7540 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Kouhei Sutou Assignee: Kouhei Sutou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7541) [GLib] Install license files
Kouhei Sutou created ARROW-7541: --- Summary: [GLib] Install license files Key: ARROW-7541 URL: https://issues.apache.org/jira/browse/ARROW-7541 Project: Apache Arrow Issue Type: Improvement Components: GLib Reporter: Kouhei Sutou Assignee: Kouhei Sutou -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Coordinating / scheduling C++ Parquet-Arrow nested data work (ARROW-1644 and others)
Hi Wes, I'm still interested in doing the work. But don't to hold anybody up if they have bandwidth. In order to actually make progress on this, my plan will be to: 1. Help with the current Java review backlog through early next week or so (this has been taking the majority of my time allocated for Arrow contributions for the last 6 months or so). 2. Shift all my attention to trying to get this done (this means no reviews other then closing out existing ones that I've started until it is done). Hopefully, other Java committers can help shrink the backlog further (Jacques thanks for you recent efforts here). Thanks, Micah On Thu, Jan 9, 2020 at 8:16 AM Wes McKinney wrote: > hi folks, > > I think we have reached a point where the incomplete C++ Parquet > nested data assembly/disassembly is harming the value of several > others parts of the project, for example the Datasets API. As another > example, it's possible to ingest nested data from JSON but not write > it to Parquet in general. > > Implementing the nested data read and write path completely is a > difficult project requiring at least several weeks of dedicated work, > so it's not so surprising that it hasn't been accomplished yet. I know > that several people have expressed interest in working on it, but I > would like to see if anyone would be able to volunteer a commitment of > time and guess on a rough timeline when this work could be done. It > seems to me if this slips beyond 2020 it will significant diminish the > value being created by other parts of the project. > > Since I'm pretty familiar with all the Parquet code I'm one candidate > person to take on this project (and I can dedicate the time, but it > would come at the expense of other projects where I can also be > useful). But Micah and others expressed interest in working on it, so > I wanted to have a discussion about it to see what others think. > > Thanks > Wes >
[jira] [Created] (ARROW-7542) [CI][C++] nrpoc isn't availabe on macOS
Kouhei Sutou created ARROW-7542: --- Summary: [CI][C++] nrpoc isn't availabe on macOS Key: ARROW-7542 URL: https://issues.apache.org/jira/browse/ARROW-7542 Project: Apache Arrow Issue Type: Improvement Components: C++, Continuous Integration Reporter: Kouhei Sutou Assignee: Kouhei Sutou https://github.com/apache/arrow/runs/38286#step:5:32 {noformat} ci/scripts/cpp_test.sh: line 31: nproc: command not found {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7543) arrow::write_parquet() code examples do not work
Keith Hughitt created ARROW-7543: Summary: arrow::write_parquet() code examples do not work Key: ARROW-7543 URL: https://issues.apache.org/jira/browse/ARROW-7543 Project: Apache Arrow Issue Type: Bug Components: Documentation Affects Versions: 0.15.1 Reporter: Keith Hughitt The code examples in the docs for the R {{arrow::write_parquet()}} method are broken. Fixed in PR: https://github.com/apache/arrow/pull/6157 -- This message was sent by Atlassian Jira (v8.3.4#803005)