Jorge Leitão created ARROW-15144:
------------------------------------

             Summary: [Java] Unable to read IPC file in master
                 Key: ARROW-15144
                 URL: https://issues.apache.org/jira/browse/ARROW-15144
             Project: Apache Arrow
          Issue Type: Bug
          Components: Java
            Reporter: Jorge Leitão
         Attachments: generated_primitive.arrow

I think that PR https://github.com/apache/arrow/pull/11709 may have caused a 
regression in reading IPC files.

Attached is an arrow file that can't be read by the Java implementation, but it 
can be read by all other implementations. Its contents correspond exactly to 
the generated_primitive.json.gz used in integration tests.

Background:
The integration CI pipeline in Rust's arrow2 started failing after the PR 
mentioned above. The logs show that all but the Java implementation are able to 
consume the attached file (and more generally the files created by arrow2's 
implementation). The PR broke almost all tests, suggesting that it is not 
something specific to the file but a broader issue.

Log: 
https://pipelines.actions.githubusercontent.com/RJ1isxNgLS0jQX3HKOGkLQjJSEMqOm4RfxnyKHS4o90jAsObvY/_apis/pipelines/1/runs/14655/signedlogcontent/2?urlExpires=2021-12-17T05%3A35%3A25.6055769Z&urlSigningMethod=HMACV1&urlSignature=Nx7nRNdrcUCbtvOnnXAYGDEuSEJUiDT%2BU2jNcqqp%2FEs%3D

The logs also suggest that the Java implementation may be leaking memory when 
such an event happens.

{code:java}
2021-12-16T05:38:37.6833847Z 05:38:37.622 [main] ERROR 
org.apache.arrow.tools.Integration - Incompatible files
2021-12-16T05:38:37.6835533Z java.lang.IllegalArgumentException: Different 
values in column:
2021-12-16T05:38:37.6836731Z f11: Timestamp(SECOND, UTC) at index 0: null != 
-62135596800
2021-12-16T05:38:37.6838188Z    at 
org.apache.arrow.vector.util.Validator.compareFieldVectors(Validator.java:133)
2021-12-16T05:38:37.6840563Z    at 
org.apache.arrow.vector.util.Validator.compareVectorSchemaRoot(Validator.java:107)
2021-12-16T05:38:37.6842476Z    at 
org.apache.arrow.tools.Integration$Command$3.execute(Integration.java:209)
2021-12-16T05:38:37.6843841Z    at 
org.apache.arrow.tools.Integration.run(Integration.java:119)
2021-12-16T05:38:37.6845214Z    at 
org.apache.arrow.tools.Integration.main(Integration.java:70)
2021-12-16T05:38:37.6846597Z    Suppressed: java.lang.IllegalStateException: 
Memory was leaked by query. Memory leaked: (894)
2021-12-16T05:38:37.6847623Z Allocator(ROOT) 0/894/442402/2147483647 
(res/actual/peak/limit)
2021-12-16T05:38:37.6848029Z 
2021-12-16T05:38:37.6848996Z            at 
org.apache.arrow.memory.BaseAllocator.close(BaseAllocator.java:437)
2021-12-16T05:38:37.6851316Z            at 
org.apache.arrow.memory.RootAllocator.close(RootAllocator.java:29)
2021-12-16T05:38:37.6882832Z            at 
org.apache.arrow.tools.Integration$Command$3.$closeResource(Integration.java:228)
2021-12-16T05:38:37.6884294Z            at 
org.apache.arrow.tools.Integration$Command$3.execute(Integration.java:228)
2021-12-16T05:38:37.6885249Z            ... 2 common frames omitted
{code}

I can't discard the possibility that this is an issue in arrow2 and an 
undefined issue in the implementation - I am raising it here because all other 
implementations can read the files.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to