[Java][Discuss]: consensus for JDK 8 deprecation

2023-09-14 Thread David Dali Susanibar Arce
Hi Arrow Java developers,

I would like to propose a timeline for dropping support for Java 8:
- Propose to drop JDK8 in Arrow v15 (2 releases from now)
- JDK 21 support will be added before removal of JDK8

Why?
- Java 8 no longer receives Premier Support (1)
- Some Arrow Java (test) dependencies have already started to drop
Java 8 support, forcing us to pin to older packager versions

Also note:
- gRPC Java may drop support for a JDK version when that version is no
longer receiving Premier Support from Oracle (2), more detail at Java
8 / Java 11 support timeline in gRPC here (3)
- Spark plans to tentatively drop JDK 8 support in Spark 4.0 (4),
which has a release timeline of approximately 2024-06 (5). Is it fine
for us to drop JDK 8 support before spark?

(1) https://www.oracle.com/java/technologies/java-se-support-roadmap.html
(2) 
https://github.com/grpc/proposal/pull/283/files#:~:text=gRPC%20Java%20may,support%5D.
(3) https://groups.google.com/g/grpc-io/c/-XK6Kd_19YQ/m/-4s07TzdAgAJ
(4) https://issues.apache.org/jira/browse/SPARK-44112
(5) https://www.mail-archive.com/dev@spark.apache.org/msg30460.html

Consider:
- JDK8 deprecation is currently not mandatory. We simply want to
devote more time to development of Java LTS versions 11, 17 and 21.
- Java 11 is dropping Premier Support this month.

Best regards,

--
David Susanibar


Re: Best practice on populating from VectorSchemaRoot to VectorSchemaRoot, ArrowStreamReader to ArrowStreamWriter

2023-04-11 Thread David Dali Susanibar Arce
Hi Wenbo Hu,

Sorry to join late. Wenbo, what about the proposal mentioned in the Java
Flight Cookbook (1). The method acceptPut will be an upstream with
VectorUnloader needed, then getStream method will be a downstream with
VectorLoader needed. Initially this cookbook use ArrowRecordBatch.
cloneWithTransfer but it did not work for all scenarios and finally was
changed to VectorLoader.load (2)

Please let us know how you see that.

(1) https://arrow.apache.org/cookbook/java/flight.html
(2) https://github.com/apache/arrow-cookbook/issues/218

Best regards,

David

El lun, 3 abr 2023 a las 7:59, Wenbo Hu () escribió:

> Hi,
>
> Consider a situation, when doGet a ticket on arrow flight rpc server,
> the server retrieves several IPC upstreams (read parquet files through
> dataset api) and push into the same downstream, how to implement with
> less copies?
> Normally with one single IPC upstream, I'll direct start
> ServerStreamListener with the getVectorSchemaRoot of the reader of the
> upstream IPC.
> It seems that I have to deal with VectorSchemaRoot rather than
> ArrowRecordBatch directly.
> What is the proper impelmentation on popluating root to root? Is that
> correct use VectorLoad/Unloader?
> Does this introduce extra steps making immediate ArrowRecordBatch
> unnecessarily? (ArrowBuf -> VectorSchemaRoot@UpstreamReader ->
> ArrowBuf@Loader ->VectorSchemaRoot@DownstreamWriter -> ArrowBuf)
>
> Maybe it relates to the allocator, is it any better implementations on
> same allocator?
> --
> -
> Best Regards,
> Wenbo Hu,
>


Re: [External] Re: row counts in footer of IPC file format

2023-03-31 Thread David Dali Susanibar Arce
Hi Team,

Hi Martin, could be a good input to validate if this new Java functionality
is already implemented in other languages like C++ to consider that as a
must-have, also to check how it is aligned with your current
implementation. Anyway, I'm really interested in the PR review.

Related to row counts, I'm also interested in the PR review.


Best regards


David

El mar, 28 mar 2023 a las 15:21, Traverse, Martin
() escribió:

> Hello,
>
> I could take a shot at the Java one if you like?
>
> I'm actually working in the codebase at the moment on something related
> that I was going to offer as a PR once it's ready. We use the Java Arrow
> library as the core of our data service, the VSR is our intermediate
> representation and we translate to/from various formats and across various
> storage backends. We really need non-blocking data read to make that
> efficient and scalable, so I've made alternate implementations of the
> Readers where you can feed in data as a series of ByteBuffer objects
> instead of calling loadNextBatch(). For streams this means feeding in bytes
> and buffering until a batch is available, for files we're reading the block
> info from the footer and then feeding in buffers (slices) for each block. I
> was able to reuse all the same serialization helpers etc.
>
> Does this sound useful? If it does then I can raise a PR for Arrow when
> it's done. No worries if not and we just keep the non-blocking readers in
> our own codebase. They're not a lot of code either way.
>
> Happy to take a shot at the row counts after that, weekend time probably.
> If I sketched out a draft PR would you be happy to take a look and tell me
> if I'm on the right lines?
>
> Kind regards,
>
> Martin Traverse
> Technical Architect
> UKI Risk
> Tel: +44 7305 120 791
> Email: martin.trave...@accenture.com
>
> My regular office hours are 10:00 - 18:30 UK time, Monday - Thursday
>
>
>
>
>
>
>
>
>
>
>
>
> -Original Message-
> From: Weston Pace 
> Sent: 28 March 2023 17:35
> To: dev@arrow.apache.org
> Subject: [External] Re: row counts in footer of IPC file format
>
> This message is from an EXTERNAL SENDER - be CAUTIOUS, particularly with
> links and attachments.
>
> I suspect the next step will be to create two implementations and create
> test files for the integration test suite.  These will be required before
> we can vote on this.
>
> Are either of you interested in contributing an implementation (C++, Rust,
> Java, and Go have been the usual suspects in the past but JS or C# should
> be viable too)?  In the past, once an implementation & test files have been
> created for one language, it has been easier to drum up a volunteer to
> create a second implementation.
>
> 
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Where allowed
> by local law, electronic communications with Accenture and its affiliates,
> including e-mail and instant messaging (including content), may be scanned
> by our systems for the purposes of information security and assessment of
> internal compliance with Accenture policy. Your privacy is important to us.
> Accenture uses your personal data only in compliance with data protection
> laws. For further information on how Accenture processes your personal
> data, please see our privacy statement at
> https://www.accenture.com/us-en/privacy-policy.
>
> __
>
> www.accenture.com
>


[DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module System

2022-05-25 Thread David Dali Susanibar Arce
Hi All,

This email's purpose is a request for comments to migrate Arrow Java to JPMS
Java Platform Module System 
JSE 9+ (1).

Current status:

- Arrow Java use JSE1.8 specification

- Arrow Java works with JSE1.8/9/11/17

- This is possible because Java offers “legacy mode”

Proposal:

Migrate to JPMS Java Platform Module System. This Draft PR
(2) contains an initial port of
the modules: Format / Memory Core / Memory Netty / Memory Unsafe / Vector
for evaluation.

Main Reason to migrate:

- JPMS offer Strong encapsulation, Well-defined interfaces
, Explicit dependencies.
 (3)(4)

- JPMS offer reliable configuration and security to hide platform internals.

- JPMS offers a partial solution to solve problems about read (80%) /write
(20%) code.

- JPMS offer optimization for readability about read/write ratio (90/10)
thru module-info.java.

- Consistency logs, JPMS implement consistency logs to really use that to
solve the current problem.

- Be able to customize JRE needed with only modules needed (not
java.desktop for example and others) thru JLink.

- Modules have also been implemented by other languages such as Javascript
(ES2015), C++(C++20), Net (Nuget/NetCore)..

- Consider taking a look at this discussion about pros/cons

(5).

- Eventual migration to JPMS is a practical necessity as more projects
migrate.

Effort:

- First of all we need to decide to move from JSE1.8 to JSE9+ or be able to
offer support for both jar components JSE1.8 and JSE9+ included.

- Go bottom up for JPMS.

- Packages need to be unique (i.e. org.apache.arrow.memory /
io.netty.buffer). Review Draft PR with initial proposal.

- Dependencies also need to be modularized. If some of our current
dependencies are not able to be used as a module this will be a blocker for
our modules (we could patch that but this is an extra effort).

Killers:

- FIXME! I need your support to identify killer reasons to be able to push
this implementation.

Please let us know if Arrow Java to JPMS Java Platform Module System is
needed and should be implemented.

Please use this file for any comments
https://docs.google.com/document/d/1qcJ8LPm33UICuGjRnsGBcm8dLI08MyiL8BO5JVzTutA/edit?usp=sharing

Resources used:

(1): https://openjdk.java.net/projects/jigsaw/spec/

(2): https://github.com/apache/arrow/pull/13072

(3): https://nipafx.dev/java-modules-reflection-vs-encapsulation/

(4): https://github.com/nipafx/demo-jigsaw-reflection

(5):
https://www.reddit.com/r/java/comments/okt3j3/do_you_use_jigsaw_modules_in_your_java_projects/

Best regards,

-- 
David