[GitHub] [arrow-site] liyafan82 commented on a change in pull request #153: 6.0 release blog post

GitBox Wed, 03 Nov 2021 18:34:39 -0700


liyafan82 commented on a change in pull request #153:
URL: https://github.com/apache/arrow-site/pull/153#discussion_r739613309




##########
File path: _posts/2021-10-22-6.0.0-release.md
##########
@@ -0,0 +1,184 @@
+---
+layout: post
+title: "Apache Arrow 6.0.0 Release"
+date: "2020-10-22 00:00:00 -0600"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+<!--
+
+To use this template:
+
+* Update all "XX" values with the appropriate numbers (you can get the 
resolved issues and contributors count from `_release/6.0.0.md`)
+* Fill in the various sections below. Note that the audience is the broader 
user community, not Arrow developers, so please write clearly using terms they 
will understand and care about. Delete any sections that don't have any content 
(as in, there are no changes to announce)
+* Delete this introductory comment
+
+ -->
+
+
+The Apache Arrow team is pleased to announce the 6.0.0 release. This covers
+over 3 months of development work and includes [**572 resolved issues**][1]
+from [**77 distinct contributors**][2]. See the Install Page to learn how to
+get the libraries for your platform.
+
+The release notes below are not exhaustive and only expose selected highlights
+of the release. Many other bugfixes and improvements have been made: we refer
+you to the [complete changelog][3].
+
+## Community
+
+## Columnar Format Notes
+* A new calendar interval type consisting of Month, Day and Nanoseconds has 
been added to the specification.  Reference implementations existing in Java, 
C++ and Python.
+## Arrow Flight RPC notes
+
+GLib and Ruby have added bindings for Arrow Flight.
+
+While not part of the release, work is ongoing on Arrow Flight SQL, which 
defines a protocol for clients to communicate with SQL databases using Arrow 
Flight. For those interested in the project, please reach out on the [mailing 
list](https://arrow.apache.org/community/).
+## C++ notes
+
+The month-day-nano interval type has been added (ARROW-13628).
+
+Various APIs, including extension types and scalars, are no longer 
experimental (ARROW-5244).
+
+Support for Visual Studio 2015 was dropped (ARROW-14070).
+
+### Compute Layer
+
+A basic in-memory query engine has been implemented and is accessible from the 
R bindings. Operations including filter, project, sort, equality joins, and 
various aggregations are supported.
+
+The following compute functions have been added:
+
+* aggregate functions: `approximate_median`, `count_distinct`, `max`, `min`, 
`product`
+* hash aggregate functions: `hash_all`, `hash_any`, `hash_approximate_median`, 
`hash_count_distinct`, `hash_distinct`, `hash_max`, `hash_mean`, `hash_min`, 
`hash_product`, `hash_stdev`, `hash_variance`
+* scalar arithmetic functions: `logb`, `round`, `round_to_multiple`
+* scalar string functions: `ascii_capitalize`, `ascii_swapcase`, 
`ascii_title`, `utf8_capitalize`, `utf8_swapcase`, `utf8_title`
+* scalar temporal functions: `assume_timezone`, `day_time_interval_between`, 
`days_between`, `hours_between`, `microseconds_between`, 
`milliseconds_between`, `minutes_between`, `month_day_nano_interval_between`, 
`month_interval_between`, `nanoseconds_between`, `quarters_between`, 
`seconds_between`, `strftime`, `us_week`, `week`, `weeks_between`, 
`years_between`
+* other scalar functions: `choose`, `list_element`
+* vector functions: `drop_null`, `select_k_unstable`
+
+In general, type support has been improved for most of the compute functions, 
but work here is ongoing, particularly around decimal support.
+
+Crashes have been fixed in particular cases for `take`, `filter`, `unique`, 
and `value_counts` (ARROW-13474, ARROW-13509, ARROW-14129).
+
+Hash aggregations (i.e. group by) supports scalar and array values 
(ARROW-13737, ARROW-14027).
+
+Temporal functions are now timezone-aware (e.g. when extracting the hour of a 
timestamp) (ARROW-12980).
+
+`count` can optionally count all values, not just null or non-null values 
(ARROW-13574).
+
+`fill_null` has been replaced by the more general `coalesce` (ARROW-7179).
+
+`is_null` can optionally consider NaN as null (ARROW-12959).
+
+Sorting has been optimized (ARROW-10898, ARROW-14165). Also, null values can 
now be sorted at either the beginning or the end (ARROW-12063).
+
+### CSV
+
+The CSV reader can read time32 and time64 types, and will infer time32 values 
for columns in the format "hh:mm" and "hh:mm:ss" (ARROW-11243).
+
+The decimal point can be customized when reading (ARROW-13421).
+
+The streaming reader will not unintentionally infer null-typed columns when 
using the various skip options (ARROW-13441).
+
+If a row has an incorrect number of columns, now the row can be skipped 
instead of raising an error (ARROW-12673).
+
+The option `quoted_strings_can_be_null` applies to all column types now, not 
just strings (ARROW-13580). When quoting is disabled entirely, the reader now 
takes advantage of this to improve performance (ARROW-14150).
+
+A CSVWriter object is now exposed, allowing for incremental writing 
(ARROW-11828). Dates can now be written (ARROW-12540).
+
+### Dataset Layer
+
+The dataset writer was refactored, and now supports more options, including a 
limit on the number of files open at once, compatibility with the async 
scanner, a limit on the number of rows written per file, and control over what 
to do when files already exist in the target directory (ARROW-13650). 
Additionally, the query engine can feed into the dataset writer as a sink 
(ARROW-13542).
+
+The asynchronous scanner now properly respects backpressure (ARROW-13611, 
ARROW-14192), as does the writer (ARROW-14191).
+
+ORC datasets are supported (ARROW-13572) with support for column projection 
pushdown (ARROW-13797).
+
+The Parquet/IPC format readers now respect the batch_size scanner option 
(ARROW-14024). Also, the Parquet reader now properly implements readahead for 
better performance (ARROW-14026).
+
+### IO and Filesystem Layer
+
+The retry strategy of S3FileSystem can be customized (ARROW-13508). When 
writing to an existing bucket as a user with limited permissions, Arrow will no 
longer emit a spurious "Access Denied" error (ARROW-13685).
+
+On MacOS with NFS mounts, a "[errno 25] Inappropriate ioctl for device" error 
was fixed (ARROW-13983). 
+
+The basics of a Google Cloud Storage filesystem have been added; work is in 
progress for full support (ARROW-8147, ARROW-14222, ARROW-14223, ARROW-14232, 
ARROW-14236, ARROW-14345, ARROW-14157).
+
+### JSON
+
+A crash was fixed when duplicate keys were present (ARROW-14109). 
+
+### Parquet
+
+Written min/max and null_count statistics for dictionary types were corrected 
(ARROW-11634, ARROW-12513).
+null_count statistics for columns that contain repeated data where corrected.
+
+file_offset for row groups was not being populated according to the 
specification, this issue has been corrected.
+
+Column selection now works for repeated columns and structs of more then one 
level.
+
+An error with large files when built with Thrift 0.14 was fixed (ARROW-13655). 
+
+The ParquetVersion enum was updated with more values to support finer-grained 
Parquet format version selection (ARROW-13794). 
+
+Writer performance was improved by avoiding repeated dynamic casts 
(ARROW-13965). 
+
+## C# notes
+
+This release includes improved support for dictionary arrays, as well as 
integration testing with the other Arrow implementations for the primitive and 
decimal types 
+
+## Go notes
+
+## Java notes
+

Review comment:
       ```suggestion
   * Some dependent libraries were upgraded. In particular, grpc upgraded to 
1.41.0, netty upgraded to 2.0.43, and orc upgraded to 1.7.0. (ARROW-14198) 
(ARROW-14049)
   * Fixed the problem of appending BitVectors in batch (ARROW-13981)
   * Code coverage support enabled for Java (ARROW-13859)
   * Fixed the incorrect string representations for unsigned integer vectors 
(ARROW-13792)
   * Reduced the memory consumption of JDBC adapters by reusing record batches 
(ARROW-13733)
   * Allowed NullVectors to have distinct field names (ARROW-13645)
   * Some APIs that have been deprecated for long have been removed 
(ARROW-13544)
   * Allowed passing empty columns for projection in Dataset (ARROW-13257)
   * A Java implementation of Arrow C data interface was provided (ARROW-12965)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-site] liyafan82 commented on a change in pull request #153: 6.0 release blog post

Reply via email to