Hi @Pratyaksh Sharma,
Okay, all right. BTW, thanks for raising this issue. best, lamber-ken On 01/2/2020 13:47,Pratyaksh Sharma<pratyaks...@gmail.com> wrote: Hi Lamberken, I am also trying to fix this issue. Please let us know if you come up with anything. On Thu, Jan 2, 2020 at 11:12 AM lamberken <lamber...@163.com> wrote: Hi @Vinoth, Got it, thank you for reminding me. I just made a mistake just now. best, lamber-ken On 01/2/2020 13:08,Vinoth Chandar<vin...@apache.org> wrote: Hi Lamber, utilities-bundle has always been a fat jar.. I was talking about hudi-utilities. Sure. take a swing at it. Happy to help as needed On Wed, Jan 1, 2020 at 8:57 PM lamberken <lamber...@163.com> wrote: Hi @Vinoth, I'm willing to solve this problem. I'm trying to find out from the history when hudi-utilities-bundle becoming not a fatjar. Git History 2019-08-29 FAT-JAR ---> 5f9fa82f47e1cc14a22b869250fe23c8f9c033cd 2019-09-14 NOT-FATJAR ---> d2525c31b7dad7bae2d4899d8df2a353ca39af50 best, lamber-ken At 2020-01-01 09:15:01, "Vinoth Chandar" <vin...@apache.org> wrote: This does sound like a fair bit of pain. I am wondering if it makes sense to change the integ-test setup/docker demo to use incremental puller. Bunch of the packaging issues around jars, seem like regressions that the hudi-utilities is not a fat jar anymore? if there are nt any takers, I can also try my hand at fixing this, once I get done with few things on my end. left a comment on HUDI-485 On Tue, Dec 31, 2019 at 4:19 PM lamberken <lamber...@163.com> wrote: Hi @Pratyaksh Sharma, Thanks for your detail stackstrace and reproduce steps. And your suggestion is reasonable. 1, For NPE issue, please tracking pr #1167 < https://github.com/apache/incubator-hudi/pull/1167> 2, For TTransportException issue, I have a question that can other statements be executed except create statement? best, lamber-ken At 2019-12-30 23:11:17, "Pratyaksh Sharma" <pratyaks...@gmail.com> wrote: Thank you Lamberken, the above issue gets resolved with what you suggested. However, still HiveIncrementalPuller is not working. Subsequently I found and fixed a bug raised here - https://issues.apache.org/jira/browse/HUDI-485. Currently I am facing the below exception when trying to run the create table statement on docker cluster. Any leads for solving this are welcome - 6811 [main] ERROR org.apache.hudi.utilities.HiveIncrementalPuller - Exception when executing SQL java.sql.SQLException: org.apache.thrift.transport.TTransportException at org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:399) at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254) at org.apache.hudi.utilities.HiveIncrementalPuller.executeStatement(HiveIncrementalPuller.java:233) at org.apache.hudi.utilities.HiveIncrementalPuller.executeIncrementalSQL(HiveIncrementalPuller.java:200) at org.apache.hudi.utilities.HiveIncrementalPuller.saveDelta(HiveIncrementalPuller.java:157) at org.apache.hudi.utilities.HiveIncrementalPuller.main(HiveIncrementalPuller.java:345) Caused by: org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374) at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451) at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:433) at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:38) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:425) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:321) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:225) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77) at org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_GetOperationStatus(TCLIService.java:467) at org.apache.hive.service.rpc.thrift.TCLIService$Client.GetOperationStatus(TCLIService.java:454) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1524) at com.sun.proxy.$Proxy5.GetOperationStatus(Unknown Source) at org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:367) ... 5 more 6812 [main] ERROR org.apache.hudi.utilities.HiveIncrementalPuller - Could not close the resultset opened java.sql.SQLException: org.apache.thrift.transport.TTransportException at org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:214) at org.apache.hive.jdbc.HiveStatement.close(HiveStatement.java:231) at org.apache.hudi.utilities.HiveIncrementalPuller.saveDelta(HiveIncrementalPuller.java:165) at org.apache.hudi.utilities.HiveIncrementalPuller.main(HiveIncrementalPuller.java:345) Caused by: org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374) at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451) at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:433) at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:38) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:425) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:321) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:225) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77) at org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_CloseOperation(TCLIService.java:513) at org.apache.hive.service.rpc.thrift.TCLIService$Client.CloseOperation(TCLIService.java:500) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1524) at com.sun.proxy.$Proxy5.CloseOperation(Unknown Source) at org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:208) ... 3 more Also the documentation does not mention the jars which need to be passed externally in classPath for executing above tool. We should upgrade the documentation to list down the jars so that it becomes easier for a new user to use this tool. I spent a lot of time adding all the jars incrementally. This jira ( https://issues.apache.org/jira/browse/HUDI-486) tracks this. On Mon, Dec 30, 2019 at 5:35 PM lamberken <lamber...@163.com> wrote: Hi @Pratyaksh Sharma Thanks for your steps to reproduce this issue. Try to modify bellow codes, and test again. org.apache.hudi.utilities.HiveIncrementalPuller#HiveIncrementalPuller / --------------------------------- / String templateContent = FileIOUtils.readAsUTFString(this.getClass().getResourceAsStream("IncrementalPull.sqltemplate")); Changed to / --------------------------------- / String templateContent = FileIOUtils.readAsUTFString(this.getClass().getResourceAsStream("/IncrementalPull.sqltemplate")); best, lamber-ken At 2019-12-30 19:25:08, "Pratyaksh Sharma" <pratyaks...@gmail.com> wrote: Hi Vinoth, I am able to reproduce this error on docker setup and have filed a jira - https://issues.apache.org/jira/browse/HUDI-484. Steps to reproduce are mentioned in the jira description itself. On Thu, Dec 26, 2019 at 12:42 PM Pratyaksh Sharma < pratyaks...@gmail.com> wrote: Hi Vinoth, I will try to reproduce the error on docker cluster and keep you updated. On Tue, Dec 24, 2019 at 11:23 PM Vinoth Chandar < vin...@apache.org> wrote: Pratyaksh, If you are still having this issue, could you try reproducing this on the docker setup https://hudi.apache.org/docker_demo.html#step-7--incremental-query-for-copy-on-write-table similar to this and raise a JIRA. Happy to look into it and get it fixed if needed Thanks Vinoth On Tue, Dec 24, 2019 at 8:43 AM lamberken <lamber...@163.com> wrote: Hi, @Pratyaksh Sharma The log4j-1.2.17.jar lib also needs to added to the classpath, for example: java -cp /path/to/hive-jdbc-2.3.1.jar:/path/to/log4j-1.2.17.jar:packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-0.5.1-SNAPSHOT.jar org.apache.hudi.utilities.HiveIncrementalPuller --help best, lamber-ken At 2019-12-24 17:23:20, "Pratyaksh Sharma" < pratyaks...@gmail.com wrote: Hi Vinoth, Sorry my bad, I did not realise earlier that spark is not needed for this class. I tried running it with the below command to get the mentioned exception - Command - java -cp /path/to/hive-jdbc-2.3.1.jar:packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-0.5.1-SNAPSHOT.jar org.apache.hudi.utilities.HiveIncrementalPuller --help Exception - Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/log4j/LogManager at org.apache.hudi.utilities.HiveIncrementalPuller.<clinit>(HiveIncrementalPuller.java:64) Caused by: java.lang.ClassNotFoundException: org.apache.log4j.LogManager at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 1 more I was able to fix it by including the corresponding jar in the bundle. After fixing the above, still I am getting the NPE even though the template is bundled in the jar. On Mon, Dec 23, 2019 at 10:45 PM Vinoth Chandar < vin...@apache.org> wrote: Hi Pratyaksh, HveIncrementalPuller is just a java program. Does not need Spark, since it just runs a HiveQL remotely.. On the error you specified, seems like it can't find the template? Can you see if the bundle does not have the template file.. May be this got broken during the bundling changes.. (since its no longer part of the resources folder of the bundle module).. We should also probably be throwing a better error than NPE.. We can raise a JIRA, once you confirm. String templateContent = FileIOUtils.readAsUTFString(this.getClass().getResourceAsStream("IncrementalPull.sqltemplate")); On Mon, Dec 23, 2019 at 6:02 AM Pratyaksh Sharma < pratyaks...@gmail.com wrote: Hi, Can someone guide me or share some documentation regarding how to use HiveIncrementalPuller. I already went through the documentation on https://hudi.apache.org/querying_data.html. I tried using this puller using the below command and facing the given exception. Any leads are appreciated. Command - spark-submit --name incremental-puller --queue etl --files incremental_sql.txt --master yarn --deploy-mode cluster --driver-memory 4g --executor-memory 4g --num-executors 2 --class org.apache.hudi.utilities.HiveIncrementalPuller hudi-utilities-bundle-0.5.1-SNAPSHOT.jar --hiveUrl jdbc:hive2://HOST:PORT/ --hiveUser <user> --hivePass <pass> --extractSQLFile incremental_sql.txt --sourceDb <source_db> --sourceTable <src_table> --targetDb tmp --targetTable tempTable --fromCommitTime 0 --maxCommits 1 Error - java.lang.NullPointerException at org.apache.hudi.common.util.FileIOUtils.copy(FileIOUtils.java:73) at org.apache.hudi.common.util.FileIOUtils.readAsUTFString(FileIOUtils.java:66) at org.apache.hudi.common.util.FileIOUtils.readAsUTFString(FileIOUtils.java:61) at org.apache.hudi.utilities.HiveIncrementalPuller.<init>(HiveIncrementalPuller.java:113) at org.apache.hudi.utilities.HiveIncrementalPuller.main(HiveIncrementalPuller.java:343)