Vectorized ORC Reader in Apache Spark 2.3 with Apache ORC 1.4.1.

2018-01-10 Thread Dongjoon Hyun
Hi, All. Vectorized ORC Reader is now supported in Apache Spark 2.3. https://issues.apache.org/jira/browse/SPARK-16060 It has been a long journey. From now, Spark can read ORC files faster without feature penalty. Thank you for all your support, especially Wenchen Fan. It's done by two

Re: Vectorized ORC Reader in Apache Spark 2.3 with Apache ORC 1.4.1.

2018-01-28 Thread Dongjoon Hyun
; Hi > > Thanks for this work. > > Will this affect both: > 1) spark.read.format("orc").load("...") > 2) spark.sql("select ... from my_orc_table_in_hive") > > ? > > > Le 10 janv. 2018 à 20:14, Dongjoon Hyun écrivait : > > Hi, All. > > > >

[ANNOUNCE] Announcing Apache ORC 1.6.6

2020-12-11 Thread Dongjoon Hyun
Hi All. We are happy to announce the availability of Apache ORC 1.6.6! https://orc.apache.org/news/2020/12/10/ORC-1.6.6/ 1.6.6 is a maintenance release containing several backward-compatibility fixes. This release is based on the branch-1.6 maintenance branch of Apache ORC. It's available

[ANNOUNCE] Announcing Apache ORC 1.6.7

2021-01-22 Thread Dongjoon Hyun
Hi All. We are happy to announce the availability of Apache ORC 1.6.7! https://orc.apache.org/news/2021/01/22/ORC-1.6.7/ 1.6.7 is a maintenance release containing several backward-compatibility fixes. This release is based on the branch-1.6 maintenance branch of Apache ORC. It's available

[ANNOUNCE] Announcing Apache ORC 1.6.8

2021-05-22 Thread Dongjoon Hyun
Hi All. We are happy to announce the availability of Apache ORC 1.6.8! https://orc.apache.org/news/2021/05/21/ORC-1.6.8/ 1.6.8 is a maintenance release containing several important fixes. This release is based on the branch-1.6 maintenance branch of Apache ORC. It's available in Apache

[ANNOUNCE] Announcing Apache ORC 1.6.9

2021-07-02 Thread Dongjoon Hyun
Hi All. We are happy to announce the availability of Apache ORC 1.6.9! https://orc.apache.org/news/2021/07/02/ORC-1.6.9/ 1.6.9 is a maintenance release containing several important fixes. This release is based on the branch-1.6 maintenance branch of Apache ORC. It's available in Apache

Re: HiveDecimalWritable(J)V

2021-03-26 Thread Dongjoon Hyun
Hi, Vladimir. Could you provide a simple reproducer example? Bests, Dongjoon. On Thu, Mar 25, 2021 at 11:46 PM Vladimir Goncharov < vova.goncharov.2...@gmail.com> wrote: > I've tried 4 variations in my pom.xml file: > 1: > > org.apache.orc > orc-mapreduce >

Re: HiveDecimalWritable(J)V

2021-04-04 Thread Dongjoon Hyun
26 мар. 2021 г. в 19:00, Dongjoon Hyun : > >> Hi, Vladimir. >> >> Could you provide a simple reproducer example? >> >> Bests, >> Dongjoon. >> >> >> On Thu, Mar 25, 2021 at 11:46 PM Vladimir Goncharov < >> vova.go

[ANNOUNCE] Announcing Apache ORC 1.6.11

2021-09-15 Thread Dongjoon Hyun
Hi All. We are happy to announce the availability of Apache ORC 1.6.11! https://orc.apache.org/news/2021/09/15/ORC-1.6.11/ 1.6.11 is a maintenance release containing several important fixes. This release is based on the branch-1.6 maintenance branch of Apache ORC. It's available in Apache

[ANNOUNCE] Announcing Apache ORC 1.5.13

2021-09-15 Thread Dongjoon Hyun
Hi All. Since Apache ORC 1.5.0 was released on May 14th, 2018, branch-1.5 has been maintained over 3 years. Now, we are happy to announce the availability of Apache ORC 1.5.13! https://orc.apache.org/news/2021/09/15/ORC-1.5.13/ It's available in Apache Downloads and Maven Central.

[ANNOUNCE] Announcing Apache ORC 1.7.0

2021-09-19 Thread Dongjoon Hyun
Hi All. We are happy to announce the availability of Apache ORC 1.7.0! https://orc.apache.org/news/2021/09/15/ORC-1.7.0/ 1.7.0 is a new release resolving 93 JIRA issues including several new features. It's available in Apache Downloads, Maven Central, and Homebrew.

Apache ORC 1.6.9 Adoption (2021-07-30)

2021-07-30 Thread Dongjoon Hyun
Hi, All. We highly recommend Apache ORC 1.6.9 as the latest and the most stable one. The Apache ORC community has been trying to deliver the latest improvements and bug fixes to all users. The following is the Apache ORC 1.6.9 adoption status in several Apache top-level projects. -

[ANNOUNCE] Announcing Apache ORC 1.7.2

2021-12-20 Thread Dongjoon Hyun
Hi All. We are happy to announce the availability of Apache ORC 1.7.2! https://orc.apache.org/news/2021/12/20/ORC-1.7.2/ 1.7.2 is a maintenance release containing 13 important fixes. It's available in Apache Downloads and Maven Central. https://downloads.apache.org/orc/orc-1.7.2/

Re: Avro vs ORC in Spark

2021-11-09 Thread Dongjoon Hyun
Hi, Ryan. I don't think you have one 100GB Avro file in production. :) If you have one million 1MB or one thousand 1GB Avro files, it becomes a completely different story. Most big data compute engines like Spark/Hive/Trino/Impala support both of them because the use cases are different. I'd

Re: ORCFile.createWriter fail after upgrade orc-core from 1.2.3 to 1.7.0

2021-11-03 Thread Dongjoon Hyun
Hi, Anthony. Thank you for trying 1.7.0. It seems that your unit test reuses the test file name. For breaking changes, I also raised similar breaking change issues at 1.6.x and fixed some in order to help the downstream migration. TITLE: Apache ORC Versioning (Semantic Versioning)

Re: ORCFile.createWriter fail after upgrade orc-core from 1.2.3 to 1.7.0

2021-11-03 Thread Dongjoon Hyun
d long time ago. > At the end I fix my unit test by providing only a temporary File object > without creating a real file in FileSystem. > That's actually what my project's non-unit test code has already been > doing. > > Anthony > > > > > > On Wed, Nov 3, 2021 a

[ANNOUNCE] Announcing Apache ORC 1.7.1

2021-11-07 Thread Dongjoon Hyun
Hi All. We are happy to announce the availability of Apache ORC 1.7.1! https://orc.apache.org/news/2021/11/07/ORC-1.7.1/ 1.7.1 is a maintenance release containing 22 important fixes. It's available in Apache Downloads, Maven Central, and Homebrew.

Python & Apache ORC

2022-02-15 Thread Dongjoon Hyun
Hi, All. Recently, PyArrow 7.0.0 started to provide a better API for ORC via ARROW-15338. New APIs are officially documented in the Apache ORC website too. https://orc.apache.org/docs/pyarrow.html If you are a Python user, please try it and let us know your feedback. We want to improve the

[ANNOUNCE] Announcing Apache ORC 1.7.3

2022-02-10 Thread Dongjoon Hyun
Hi All. We are happy to announce the availability of Apache ORC 1.7.3! https://orc.apache.org/news/2022/02/09/ORC-1.7.3/ 1.7.3 is a maintenance release containing 50 important fixes. It's available in Apache Downloads and Maven Central. https://downloads.apache.org/orc/orc-1.7.3/

[ANNOUNCE] Announcing Apache ORC 1.6.13

2022-01-20 Thread Dongjoon Hyun
Hi All. We are happy to announce the availability of Apache ORC 1.6.13! https://orc.apache.org/news/2022/01/20/ORC-1.6.13/ 1.6.13 is a maintenance release containing 6 important fixes. It's available in Apache Downloads and Maven Central. https://downloads.apache.org/orc/orc-1.6.13/

[ANNOUNCE] Announcing Apache ORC 1.6.14

2022-04-14 Thread Dongjoon Hyun
Hi All. We are happy to announce the availability of Apache ORC 1.6.14! https://orc.apache.org/news/2022/04/14/ORC-1.6.14/ 1.6.14 is a maintenance release containing 5 important fixes. It's available in Apache Downloads and Maven Central. https://downloads.apache.org/orc/orc-1.6.14/

Re: [ANNOUNCE] Announcing Apache ORC 1.8.5

2023-09-06 Thread Dongjoon Hyun
Thank you for leading this, Gang. It's great for the Apache ORC community to have a new official writer ID, CUDF. ORC-1489 Assign a writer id to CUDF I hope Apache ORC 1.8.5+ helps the downstream projects more. Bests, Dongjoon. On Tue, Sep 5, 2023 at 9:18 AM Gang Wu wrote: > Hi All. > > We

[ANNOUNCE] Announcing Apache ORC 1.9.1

2023-08-16 Thread Dongjoon Hyun
Hi All. We are happy to announce the availability of Apache ORC 1.9.1! https://orc.apache.org/news/2023/08/16/ORC-1.9.1/ 1.9.1 is a maintenance release containing important fixes. It's available in Apache Downloads and Maven Central. https://downloads.apache.org/orc/orc-1.9.1/

[ANNOUNCE] Announcing Apache ORC 1.8.6

2023-11-10 Thread Dongjoon Hyun
Hi All. We are happy to announce the availability of Apache ORC 1.8.6! https://orc.apache.org/news/2023/11/10/ORC-1.8.6/ 1.8.6 is a maintenance release containing important fixes. It's available in Apache Downloads and Maven Central. https://downloads.apache.org/orc/orc-1.8.6/

[ANNOUNCE] Announcing Apache ORC 1.7.10

2023-11-10 Thread Dongjoon Hyun
Hi All. We are happy to announce the availability of Apache ORC 1.7.10! https://orc.apache.org/news/2023/11/10/ORC-1.7.10/ 1.7.10 is a maintenance release containing important fixes. It's available in Apache Downloads and Maven Central. https://downloads.apache.org/orc/orc-1.7.10/

[ANNOUNCE] Announcing Apache ORC 1.9.2

2023-11-10 Thread Dongjoon Hyun
Hi All. We are happy to announce the availability of Apache ORC 1.9.2! https://orc.apache.org/news/2023/11/10/ORC-1.9.2/ 1.9.2 is a maintenance release containing bug fixes. It's available in Apache Downloads and Maven Central. https://downloads.apache.org/orc/orc-1.9.2/

Re: [ANNOUNCE] Announcing Apache ORC 1.7.5

2022-06-21 Thread Dongjoon Hyun
Thank you so much for leading this again, William. I also saw your PRs on our downstream projects. MERGED: - Homebrew: https://github.com/Homebrew/homebrew-core/pull/103871 - SPARK-39493: https://github.com/apache/spark/pull/36892 - ARROW-16848: https://github.com/apache/arrow/pull/13392 CI

Re: Mysterious exceptions when writing ORC files

2022-06-12 Thread Dongjoon Hyun
Hi, Matthias May I ask what is your previous ORC version you succeeded before? Dongjoon On 2022/06/12 14:32:32 Matthias Meier wrote: > Hi, > > we are using the Apache ORC library to generate ORC files. That works > generally well but every now and then we run into two kinds of exceptions >

Re: [ANNOUNCE] Announcing Apache ORC 1.8.0

2022-09-03 Thread Dongjoon Hyun
It's great! Thank you, William. Dongjoon On Sat, Sep 3, 2022 at 7:30 PM William H. wrote: > Hi All. > > We are happy to announce the availability of Apache ORC 1.8.0! > > https://orc.apache.org/news/2022/09/03/ORC-1.8.0/ > > 1.8.0 is a minor release containing new features and

Re: [ANNOUNCE] Announcing Apache ORC 1.7.6

2022-08-25 Thread Dongjoon Hyun
It's really great. Thank you, William. Dongjoon. On Thu, Aug 25, 2022 at 12:27 AM William H. wrote: > Hi All. > > As of today, Apache ORC 1.7.6 is applied in the following projects: > > SPARK-40134Update ORC to 1.7.6 (Apache Spark 3.4.0, 3.3.1) > >

Re: ORC list column size

2023-01-10 Thread Dongjoon Hyun
It sounds interesting. Are you writing and reading ORC files progmatically via ORC library? Or, do you use Spark/Flink/PyArrow/Dask? Dongjoon On Tue, Jan 10, 2023 at 8:23 AM Hinko Kocevar wrote: > I would like to use ORC file to hold several columns of data. One of the > columns will be a list

[ANNOUNCE] Announcing Apache ORC 1.8.1

2022-12-02 Thread Dongjoon Hyun
Hi All. We are happy to announce the availability of Apache ORC 1.8.1! https://orc.apache.org/news/2022/12/02/ORC-1.8.1/ 1.8.1 is a maintenance release containing important fixes. It's available in Apache Downloads and Maven Central. https://downloads.apache.org/orc/orc-1.8.1/

[ANNOUNCE] Announcing Apache ORC 1.7.7

2022-11-17 Thread Dongjoon Hyun
Hi All. We are happy to announce the availability of Apache ORC 1.7.7! https://orc.apache.org/news/2022/11/17/ORC-1.7.7/ 1.7.7 is a maintenance release containing important fixes. It's available in Apache Downloads and Maven Central. https://downloads.apache.org/orc/orc-1.7.7/

[ANNOUNCE] Announcing Apache ORC 1.8.2

2023-01-13 Thread Dongjoon Hyun
Hi All. We are happy to announce the availability of Apache ORC 1.8.2! https://orc.apache.org/news/2023/01/13/ORC-1.8.2/ 1.8.2 is a maintenance release containing important fixes. It's available in Apache Downloads and Maven Central. https://downloads.apache.org/orc/orc-1.8.2/

Re: [ANNOUNCE] Announcing Apache ORC 1.7.8

2023-01-22 Thread Dongjoon Hyun
Thank you always William! Dongjoon On Sat, Jan 21, 2023 at 6:26 PM William H. wrote: > Hi All. > > We are happy to announce the availability of Apache ORC 1.7.8! > > - https://orc.apache.org/news/2023/01/21/ORC-1.7.8/ > > 1.7.8 is a maintenance release containing bug fixes. > It's available in

[ANNOUNCE] Announcing Apache ORC 1.8.3

2023-03-15 Thread Dongjoon Hyun
Hi All. We are happy to announce the availability of Apache ORC 1.8.3! https://orc.apache.org/news/2023/03/15/ORC-1.8.3/ 1.8.3 is a maintenance release containing important fixes. It's available in Apache Downloads and Maven Central. https://downloads.apache.org/orc/orc-1.8.3/

Re: [ANNOUNCE] Welcome Xin Zhang as an ORC committer!

2023-02-11 Thread Dongjoon Hyun
Welcome, Xin! :) Dongjoon. On Fri, Feb 10, 2023 at 9:21 PM Gang Wu wrote: > Welcome Xin! > > Best, > Gang > > On Sat, Feb 11, 2023 at 1:07 PM William H. wrote: > > > The Apache ORC PMC recently added Xin Zhang > > (https://github.com/coderex2522) as a committer. > > > > Please join me in

[ANNOUNCE] Announcing Apache ORC 1.9.0

2023-06-28 Thread Dongjoon Hyun
Hi All. We are happy to announce the availability of Apache ORC 1.9.0! https://orc.apache.org/news/2023/06/28/ORC-1.9.0/ 1.9.0 is a minor release containing new features and improvements. It's available in Apache Downloads and Maven Central. https://downloads.apache.org/orc/orc-1.9.0/

Re: [ANNOUNCE] Announcing Apache ORC 1.7.9

2023-05-07 Thread Dongjoon Hyun
It's great! Thank you, Gang. Dongjoon. On Sun, May 7, 2023 at 2:02 AM Gang Wu wrote: > Hi All. > > We are happy to announce the availability of Apache ORC 1.7.9! > > https://orc.apache.org/news/2023/05/07/ORC-1.7.9/ > > 1.7.9 is a maintenance release containing important fixes. > It's

[FYI] ORC Default Compression - ZStandard

2024-01-10 Thread Dongjoon Hyun
Hi, All. The default ORC compression is changed from ZLIB to ZStandard in order to make smaller and faster files. Especially in the cloud environment. Please check the relevant links for the details. - https://github.com/apache/orc/pull/1733 (ORC-1577: Use ZSTD as the default compression) -

Re: [ANNOUNCE] Announcing Apache ORC Format 1.0.0

2024-01-07 Thread Dongjoon Hyun
Finally! Thank you, William. Dongjoon. On Sat, Jan 6, 2024 at 14:33 William H. wrote: > Hi All. > > We are happy to announce the availability of Apache ORC Format 1.0.0! > > https://github.com/apache/orc-format/releases/tag/v1.0.0 > > The ORC Format project includes ORC specifications and

[ANNOUNCE] Announcing Apache ORC 1.8.7

2024-04-14 Thread Dongjoon Hyun
Hi All. We are happy to announce the availability of Apache ORC 1.8.7! https://orc.apache.org/news/2024/04/14/ORC-1.8.7/ 1.8.7 is a maintenance release containing important fixes. It's available in Apache Downloads and Maven Central. https://downloads.apache.org/orc/orc-1.8.7/

Re: [ANNOUNCE] Announcing Apache ORC 2.0.1

2024-05-15 Thread Dongjoon Hyun
Thank you so much, William. Dongjoon. On Tue, May 14, 2024 at 10:50 PM William H. wrote: > Hi All! > > We are happy to announce the availability of Apache ORC 2.0.1! > > https://orc.apache.org/news/2024/05/14/ORC-2.0.1/ > > 2.0.1 is a maintenance release containing bug fixes and tool

Re: [ANNOUNCE] Announcing ORC availability in Conan

2024-03-11 Thread Dongjoon Hyun
It’s a great news! Thank you so much. Dongjoon On Mon, Mar 11, 2024 at 19:05 Gang Wu wrote: > Hi All. > > We are happy to announce the availability of Apache ORC C++ library > in the conan center, which is the home of the popular C++ package > manager: > > https://conan.io/center/recipes/orc >

Re: [ANNOUNCE] Announcing Apache ORC 1.9.3

2024-03-21 Thread Dongjoon Hyun
Thank you! Dongjoon. On Wed, Mar 20, 2024 at 10:15 PM Gang Wu wrote: > Hi All. > > We are happy to announce the availability of Apache ORC 1.9.3! > > https://orc.apache.org/news/2024/03/20/ORC-1.9.3/ > > 1.9.3 is a maintenance release containing important fixes. > It's available in Apache

[ANNOUNCE] Announcing Apache ORC 2.0.0

2024-03-08 Thread Dongjoon Hyun
Hi All. We are happy to announce the availability of Apache ORC 2.0.0! https://orc.apache.org/news/2024/03/08/ORC-2.0.0/ 2.0.0 is a major release containing new features and improvements like the following: - ORC-1547: Spin-off ORC Format - ORC-1572: Use Apache ORC Format 1.0.0

Re: [ANNOUNCE] Announcing Apache ORC 2.0.0

2024-03-11 Thread Dongjoon Hyun
Thank you! Dongjoon On Sun, Mar 10, 2024 at 2:53 PM William H. wrote: > Thank you Dongjoon for leading this major release! > > Bests, > William > > On Fri, Mar 8, 2024 at 2:34 PM Dongjoon Hyun > wrote: > > > > Hi All. > > > > We are happy to