hi Vinoth Chandar / Pratyaksh Sharma,
I reseted many commits from git and check whether HiveIncrementalPuller works normally. It seems that HiveIncrementalPuller has been working abnormallyis for a long time. For detail reproduce steps, please visit HUDI-486 <https://issues.apache.org/jira/browse/HUDI-486> best, lamber-ken At 2020-01-01 09:15:01, "Vinoth Chandar" <vin...@apache.org> wrote: >This does sound like a fair bit of pain. >I am wondering if it makes sense to change the integ-test setup/docker demo >to use incremental puller. Bunch of the packaging issues around jars, seem >like regressions that the hudi-utilities is not a fat jar anymore? > >if there are nt any takers, I can also try my hand at fixing this, once I >get done with few things on my end. left a comment on HUDI-485 > > > >On Tue, Dec 31, 2019 at 4:19 PM lamberken <lamber...@163.com> wrote: > >> >> >> Hi @Pratyaksh Sharma, >> >> >> Thanks for your detail stackstrace and reproduce steps. And your >> suggestion is reasonable. >> >> >> 1, For NPE issue, please tracking pr #1167 < >> https://github.com/apache/incubator-hudi/pull/1167> >> 2, For TTransportException issue, I have a question that can other >> statements be executed except create statement? >> >> >> best, >> lamber-ken >> >> At 2019-12-30 23:11:17, "Pratyaksh Sharma" <pratyaks...@gmail.com> wrote: >> >Thank you Lamberken, the above issue gets resolved with what you >> suggested. >> >However, still HiveIncrementalPuller is not working. >> >Subsequently I found and fixed a bug raised here - >> >https://issues.apache.org/jira/browse/HUDI-485. >> > >> >Currently I am facing the below exception when trying to run the create >> >table statement on docker cluster. Any leads for solving this are welcome >> - >> > >> >6811 [main] ERROR org.apache.hudi.utilities.HiveIncrementalPuller - >> >Exception when executing SQL >> > >> >java.sql.SQLException: org.apache.thrift.transport.TTransportException >> > >> >at >> >> >org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:399) >> > >> >at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254) >> > >> >at >> >> >org.apache.hudi.utilities.HiveIncrementalPuller.executeStatement(HiveIncrementalPuller.java:233) >> > >> >at >> >> >org.apache.hudi.utilities.HiveIncrementalPuller.executeIncrementalSQL(HiveIncrementalPuller.java:200) >> > >> >at >> >> >org.apache.hudi.utilities.HiveIncrementalPuller.saveDelta(HiveIncrementalPuller.java:157) >> > >> >at >> >> >org.apache.hudi.utilities.HiveIncrementalPuller.main(HiveIncrementalPuller.java:345) >> > >> >Caused by: org.apache.thrift.transport.TTransportException >> > >> >at >> >> >org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) >> > >> >at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) >> > >> >at >> >> >org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374) >> > >> >at >> >> >org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451) >> > >> >at >> org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:433) >> > >> >at >> >> >org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:38) >> > >> >at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) >> > >> >at >> >> >org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:425) >> > >> >at >> >> >org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:321) >> > >> >at >> >> >org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:225) >> > >> >at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77) >> > >> >at >> >> >org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_GetOperationStatus(TCLIService.java:467) >> > >> >at >> >> >org.apache.hive.service.rpc.thrift.TCLIService$Client.GetOperationStatus(TCLIService.java:454) >> > >> >at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> > >> >at >> >> >sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >> > >> >at >> >> >sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> > >> >at java.lang.reflect.Method.invoke(Method.java:498) >> > >> >at >> >> >org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1524) >> > >> >at com.sun.proxy.$Proxy5.GetOperationStatus(Unknown Source) >> > >> >at >> >> >org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:367) >> > >> >... 5 more >> > >> >6812 [main] ERROR org.apache.hudi.utilities.HiveIncrementalPuller - Could >> >not close the resultset opened >> > >> >java.sql.SQLException: org.apache.thrift.transport.TTransportException >> > >> >at >> >> >org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:214) >> > >> >at org.apache.hive.jdbc.HiveStatement.close(HiveStatement.java:231) >> > >> >at >> >> >org.apache.hudi.utilities.HiveIncrementalPuller.saveDelta(HiveIncrementalPuller.java:165) >> > >> >at >> >> >org.apache.hudi.utilities.HiveIncrementalPuller.main(HiveIncrementalPuller.java:345) >> > >> >Caused by: org.apache.thrift.transport.TTransportException >> > >> >at >> >> >org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) >> > >> >at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) >> > >> >at >> >> >org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374) >> > >> >at >> >> >org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451) >> > >> >at >> org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:433) >> > >> >at >> >> >org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:38) >> > >> >at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) >> > >> >at >> >> >org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:425) >> > >> >at >> >> >org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:321) >> > >> >at >> >> >org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:225) >> > >> >at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77) >> > >> >at >> >> >org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_CloseOperation(TCLIService.java:513) >> > >> >at >> >> >org.apache.hive.service.rpc.thrift.TCLIService$Client.CloseOperation(TCLIService.java:500) >> > >> >at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> > >> >at >> >> >sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >> > >> >at >> >> >sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> > >> >at java.lang.reflect.Method.invoke(Method.java:498) >> > >> >at >> >> >org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1524) >> > >> >at com.sun.proxy.$Proxy5.CloseOperation(Unknown Source) >> > >> >at >> >> >org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:208) >> > >> >... 3 more >> > >> >Also the documentation does not mention the jars which need to be passed >> >externally in classPath for executing above tool. We should upgrade the >> >documentation to list down the jars so that it becomes easier for a new >> >user to use this tool. I spent a lot of time adding all the jars >> >incrementally. This jira (https://issues.apache.org/jira/browse/HUDI-486) >> >tracks this. >> > >> >On Mon, Dec 30, 2019 at 5:35 PM lamberken <lamber...@163.com> wrote: >> > >> >> >> >> >> >> Hi @Pratyaksh Sharma >> >> >> >> >> >> Thanks for your steps to reproduce this issue. Try to modify bellow >> codes, >> >> and test again. >> >> >> >> >> >> org.apache.hudi.utilities.HiveIncrementalPuller#HiveIncrementalPuller / >> >> --------------------------------- / String templateContent = >> >> >> FileIOUtils.readAsUTFString(this.getClass().getResourceAsStream("IncrementalPull.sqltemplate")); >> >> Changed to >> >> / --------------------------------- / String templateContent = >> >> >> FileIOUtils.readAsUTFString(this.getClass().getResourceAsStream("/IncrementalPull.sqltemplate")); >> >> best, >> >> lamber-ken >> >> >> >> >> >> >> >> >> >> >> >> At 2019-12-30 19:25:08, "Pratyaksh Sharma" <pratyaks...@gmail.com> >> wrote: >> >> >Hi Vinoth, >> >> > >> >> >I am able to reproduce this error on docker setup and have filed a >> jira - >> >> >https://issues.apache.org/jira/browse/HUDI-484. >> >> > >> >> >Steps to reproduce are mentioned in the jira description itself. >> >> > >> >> >On Thu, Dec 26, 2019 at 12:42 PM Pratyaksh Sharma < >> pratyaks...@gmail.com> >> >> >wrote: >> >> > >> >> >> Hi Vinoth, >> >> >> >> >> >> I will try to reproduce the error on docker cluster and keep you >> >> updated. >> >> >> >> >> >> On Tue, Dec 24, 2019 at 11:23 PM Vinoth Chandar <vin...@apache.org> >> >> wrote: >> >> >> >> >> >>> Pratyaksh, >> >> >>> >> >> >>> If you are still having this issue, could you try reproducing this >> on >> >> the >> >> >>> docker setup >> >> >>> >> >> >>> >> >> >> https://hudi.apache.org/docker_demo.html#step-7--incremental-query-for-copy-on-write-table >> >> >>> similar to this and raise a JIRA. >> >> >>> Happy to look into it and get it fixed if needed >> >> >>> >> >> >>> Thanks >> >> >>> Vinoth >> >> >>> >> >> >>> On Tue, Dec 24, 2019 at 8:43 AM lamberken <lamber...@163.com> >> wrote: >> >> >>> >> >> >>> > >> >> >>> > >> >> >>> > Hi, @Pratyaksh Sharma >> >> >>> > >> >> >>> > >> >> >>> > The log4j-1.2.17.jar lib also needs to added to the classpath, for >> >> >>> example: >> >> >>> > java -cp >> >> >>> > >> >> >>> >> >> >> /path/to/hive-jdbc-2.3.1.jar:/path/to/log4j-1.2.17.jar:packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-0.5.1-SNAPSHOT.jar >> >> >>> > org.apache.hudi.utilities.HiveIncrementalPuller --help >> >> >>> > >> >> >>> > >> >> >>> > best, >> >> >>> > lamber-ken >> >> >>> > >> >> >>> > At 2019-12-24 17:23:20, "Pratyaksh Sharma" <pratyaks...@gmail.com >> > >> >> >>> wrote: >> >> >>> > >Hi Vinoth, >> >> >>> > > >> >> >>> > >Sorry my bad, I did not realise earlier that spark is not needed >> for >> >> >>> this >> >> >>> > >class. I tried running it with the below command to get the >> >> mentioned >> >> >>> > >exception - >> >> >>> > > >> >> >>> > >Command - >> >> >>> > > >> >> >>> > >java -cp >> >> >>> > >> >> >>> > >> >> >>> >> >> >> >/path/to/hive-jdbc-2.3.1.jar:packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-0.5.1-SNAPSHOT.jar >> >> >>> > >org.apache.hudi.utilities.HiveIncrementalPuller --help >> >> >>> > > >> >> >>> > >Exception - >> >> >>> > >Exception in thread "main" java.lang.NoClassDefFoundError: >> >> >>> > >org/apache/log4j/LogManager >> >> >>> > > at >> >> >>> > >> >> >>> > >> >> >>> >> >> >> >org.apache.hudi.utilities.HiveIncrementalPuller.<clinit>(HiveIncrementalPuller.java:64) >> >> >>> > >Caused by: java.lang.ClassNotFoundException: >> >> >>> org.apache.log4j.LogManager >> >> >>> > > at >> >> java.net.URLClassLoader.findClass(URLClassLoader.java:382) >> >> >>> > > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >> >> >>> > > at >> >> >>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) >> >> >>> > > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >> >> >>> > > ... 1 more >> >> >>> > > >> >> >>> > >I was able to fix it by including the corresponding jar in the >> >> bundle. >> >> >>> > > >> >> >>> > >After fixing the above, still I am getting the NPE even though >> the >> >> >>> > template >> >> >>> > >is bundled in the jar. >> >> >>> > > >> >> >>> > >On Mon, Dec 23, 2019 at 10:45 PM Vinoth Chandar < >> vin...@apache.org> >> >> >>> > wrote: >> >> >>> > > >> >> >>> > >> Hi Pratyaksh, >> >> >>> > >> >> >> >>> > >> HveIncrementalPuller is just a java program. Does not need >> Spark, >> >> >>> since >> >> >>> > it >> >> >>> > >> just runs a HiveQL remotely.. >> >> >>> > >> >> >> >>> > >> On the error you specified, seems like it can't find the >> template? >> >> >>> Can >> >> >>> > you >> >> >>> > >> see if the bundle does not have the template file.. May be this >> >> got >> >> >>> > broken >> >> >>> > >> during the bundling changes.. (since its no longer part of the >> >> >>> resources >> >> >>> > >> folder of the bundle module).. We should also probably be >> >> throwing a >> >> >>> > better >> >> >>> > >> error than NPE.. >> >> >>> > >> >> >> >>> > >> We can raise a JIRA, once you confirm. >> >> >>> > >> >> >> >>> > >> String templateContent = >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> >> >>> >> >> >> FileIOUtils.readAsUTFString(this.getClass().getResourceAsStream("IncrementalPull.sqltemplate")); >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> On Mon, Dec 23, 2019 at 6:02 AM Pratyaksh Sharma < >> >> >>> pratyaks...@gmail.com >> >> >>> > > >> >> >>> > >> wrote: >> >> >>> > >> >> >> >>> > >> > Hi, >> >> >>> > >> > >> >> >>> > >> > Can someone guide me or share some documentation regarding >> how >> >> to >> >> >>> use >> >> >>> > >> > HiveIncrementalPuller. I already went through the >> documentation >> >> on >> >> >>> > >> > https://hudi.apache.org/querying_data.html. I tried using >> this >> >> >>> puller >> >> >>> > >> > using >> >> >>> > >> > the below command and facing the given exception. >> >> >>> > >> > >> >> >>> > >> > Any leads are appreciated. >> >> >>> > >> > >> >> >>> > >> > Command - >> >> >>> > >> > spark-submit --name incremental-puller --queue etl --files >> >> >>> > >> > incremental_sql.txt --master yarn --deploy-mode cluster >> >> >>> > --driver-memory >> >> >>> > >> 4g >> >> >>> > >> > --executor-memory 4g --num-executors 2 --class >> >> >>> > >> > org.apache.hudi.utilities.HiveIncrementalPuller >> >> >>> > >> > hudi-utilities-bundle-0.5.1-SNAPSHOT.jar --hiveUrl >> >> >>> > >> > jdbc:hive2://HOST:PORT/ --hiveUser <user> --hivePass <pass> >> >> >>> > >> > --extractSQLFile incremental_sql.txt --sourceDb <source_db> >> >> >>> > --sourceTable >> >> >>> > >> > <src_table> --targetDb tmp --targetTable tempTable >> >> >>> --fromCommitTime 0 >> >> >>> > >> > --maxCommits 1 >> >> >>> > >> > >> >> >>> > >> > Error - >> >> >>> > >> > >> >> >>> > >> > java.lang.NullPointerException >> >> >>> > >> > at >> >> >>> org.apache.hudi.common.util.FileIOUtils.copy(FileIOUtils.java:73) >> >> >>> > >> > at >> >> >>> > >> > >> >> >>> > >> > >> >> >>> > >> >> >> >>> > >> >> >>> >> >> >> org.apache.hudi.common.util.FileIOUtils.readAsUTFString(FileIOUtils.java:66) >> >> >>> > >> > at >> >> >>> > >> > >> >> >>> > >> > >> >> >>> > >> >> >> >>> > >> >> >>> >> >> >> org.apache.hudi.common.util.FileIOUtils.readAsUTFString(FileIOUtils.java:61) >> >> >>> > >> > at >> >> >>> > >> > >> >> >>> > >> > >> >> >>> > >> >> >> >>> > >> >> >>> >> >> >> org.apache.hudi.utilities.HiveIncrementalPuller.<init>(HiveIncrementalPuller.java:113) >> >> >>> > >> > at >> >> >>> > >> > >> >> >>> > >> > >> >> >>> > >> >> >> >>> > >> >> >>> >> >> >> org.apache.hudi.utilities.HiveIncrementalPuller.main(HiveIncrementalPuller.java:343) >> >> >>> > >> > >> >> >>> > >> >> >> >>> > >> >> >>> >> >> >> >> >> >>