hudi-bot opened a new issue, #17254: URL: https://github.com/apache/hudi/issues/17254
ORC tests fail on Spark 3.5 in Azure CI due to setup issues. These tests are temporarily disabled. We should fix them. ## JIRA info - Link: https://issues.apache.org/jira/browse/HUDI-8081 - Type: Sub-task - Parent: https://issues.apache.org/jira/browse/HUDI-9113 - Fix version(s): - 1.1.0 --- ## Comments 07/Oct/24 23:03;yihua;Error stacktrace {code:java} java.io.IOException: Problem adding row to file:/var/folders/60/wk8qzx310fd32b2dp7mhzvdc0000gn/T/junit1796010661885682116/orcFiles/1.orc at org.apache.orc.impl.WriterImpl.addRowBatch(WriterImpl.java:761) at org.apache.hudi.utilities.testutils.UtilitiesTestBase$Helpers.saveORCToDFS(UtilitiesTestBase.java:446) at org.apache.hudi.utilities.testutils.UtilitiesTestBase$Helpers.saveORCToDFS(UtilitiesTestBase.java:434) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamerTestBase.prepareORCDFSFiles(HoodieDeltaStreamerTestBase.java:444) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamerTestBase.prepareORCDFSFiles(HoodieDeltaStreamerTestBase.java:432) at org.apache.hudi.utilities.deltastreamer.TestHoodieDeltaStreamer.testORCDFSSource(TestHoodieDeltaStreamer.java:1799) at org.apache.hudi.utilities.deltastreamer.TestHoodieDeltaStreamer.testORCDFSSourceWithoutSchemaProviderAndNoTransformer(TestHoodieDeltaStreamer.java:2220) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch.isSelectedInUse()Z at org.apache.orc.impl.WriterImpl.addRowBatch(WriterImpl.java:710){code} ;;; --- 10/Oct/24 17:46;linliu;After upgrade to >= 3.x, we saw the following error: {code:java} java.lang.ClassCastException: org.apache.hadoop.hive.thrift.TUGIContainingTransport cannot be cast to org.apache.hadoop.hive.metastore.security.TUGIContainingTransport at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.setIpAddress(TUGIBasedProcessor.java:177) ~[hive-standalone-metastore-3.1.3.jar:3.1.3] at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:74) ~[hive-standalone-metastore-3.1.3.jar:3.1.3] at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) [libthrift-0.9.3.jar:0.9.3] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_392-internal] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_392-internal] at java.lang.Thread.run(Thread.java:750) [?:1.8.0_392-internal] {code} ;;; --- 10/Oct/24 17:49;linliu;The reason is that the for >= 3.x, we have the two classes in the same jar: {code:java} org.apache.hadoop.hive.thrift.TUGIContainingTransport, org.apache.hadoop.hive.metastore.security.TUGIContainingTransport {code} There is no easy way to exclude any of the two classes. Meanwhile, we don't know if there are any method differences in the classes. Therefore, I decided to postpone this ticket and revisit it after we finished higher priority tasks. ;;; --- 10/Oct/24 18:29;yihua;Thanks for the findings. Here are the tests that are disabled due to the ORC and Hive dependency issues: * testORCDFSSourceWithoutSchemaProviderAndNoTransformer, testORCDFSSourceWithSchemaProviderAndWithTransformer * TestHoodieSnapshotExporter with ORC As long as we validate that the Hudi streamer can read ORC DFS source using spark-submit and HoodieSnapshotExporter can export ORC format, then we are good, i.e., this is a test setup issue which we can tackle later.;;; --- 13/Oct/24 23:01;yihua;I'm deferring this task to Hudi 1.1.;;; -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
