Thank you Lamberken, the above issue gets resolved with what you suggested.
However, still HiveIncrementalPuller is not working.
Subsequently I found and fixed a bug raised here -
https://issues.apache.org/jira/browse/HUDI-485.

Currently I am facing the below exception when trying to run the create
table statement on docker cluster. Any leads for solving this are welcome -

6811 [main] ERROR org.apache.hudi.utilities.HiveIncrementalPuller  -
Exception when executing SQL

java.sql.SQLException: org.apache.thrift.transport.TTransportException

at
org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:399)

at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)

at
org.apache.hudi.utilities.HiveIncrementalPuller.executeStatement(HiveIncrementalPuller.java:233)

at
org.apache.hudi.utilities.HiveIncrementalPuller.executeIncrementalSQL(HiveIncrementalPuller.java:200)

at
org.apache.hudi.utilities.HiveIncrementalPuller.saveDelta(HiveIncrementalPuller.java:157)

at
org.apache.hudi.utilities.HiveIncrementalPuller.main(HiveIncrementalPuller.java:345)

Caused by: org.apache.thrift.transport.TTransportException

at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)

at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)

at
org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374)

at
org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451)

at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:433)

at
org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:38)

at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)

at
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:425)

at
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:321)

at
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:225)

at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)

at
org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_GetOperationStatus(TCLIService.java:467)

at
org.apache.hive.service.rpc.thrift.TCLIService$Client.GetOperationStatus(TCLIService.java:454)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at
org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1524)

at com.sun.proxy.$Proxy5.GetOperationStatus(Unknown Source)

at
org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:367)

... 5 more

6812 [main] ERROR org.apache.hudi.utilities.HiveIncrementalPuller  - Could
not close the resultset opened

java.sql.SQLException: org.apache.thrift.transport.TTransportException

at
org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:214)

at org.apache.hive.jdbc.HiveStatement.close(HiveStatement.java:231)

at
org.apache.hudi.utilities.HiveIncrementalPuller.saveDelta(HiveIncrementalPuller.java:165)

at
org.apache.hudi.utilities.HiveIncrementalPuller.main(HiveIncrementalPuller.java:345)

Caused by: org.apache.thrift.transport.TTransportException

at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)

at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)

at
org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374)

at
org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451)

at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:433)

at
org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:38)

at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)

at
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:425)

at
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:321)

at
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:225)

at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)

at
org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_CloseOperation(TCLIService.java:513)

at
org.apache.hive.service.rpc.thrift.TCLIService$Client.CloseOperation(TCLIService.java:500)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at
org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1524)

at com.sun.proxy.$Proxy5.CloseOperation(Unknown Source)

at
org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:208)

... 3 more

Also the documentation does not mention the jars which need to be passed
externally in classPath for executing above tool. We should upgrade the
documentation to list down the jars so that it becomes easier for a new
user to use this tool. I spent a lot of time adding all the jars
incrementally. This jira (https://issues.apache.org/jira/browse/HUDI-486)
tracks this.

On Mon, Dec 30, 2019 at 5:35 PM lamberken <lamber...@163.com> wrote:

>
>
> Hi @Pratyaksh Sharma
>
>
> Thanks for your steps to reproduce this issue. Try to modify bellow codes,
> and test again.
>
>
> org.apache.hudi.utilities.HiveIncrementalPuller#HiveIncrementalPuller /
> --------------------------------- / String templateContent =
> FileIOUtils.readAsUTFString(this.getClass().getResourceAsStream("IncrementalPull.sqltemplate"));
> Changed to
> / --------------------------------- / String templateContent =
> FileIOUtils.readAsUTFString(this.getClass().getResourceAsStream("/IncrementalPull.sqltemplate"));
> best,
> lamber-ken
>
>
>
>
>
> At 2019-12-30 19:25:08, "Pratyaksh Sharma" <pratyaks...@gmail.com> wrote:
> >Hi Vinoth,
> >
> >I am able to reproduce this error on docker setup and have filed a jira -
> >https://issues.apache.org/jira/browse/HUDI-484.
> >
> >Steps to reproduce are mentioned in the jira description itself.
> >
> >On Thu, Dec 26, 2019 at 12:42 PM Pratyaksh Sharma <pratyaks...@gmail.com>
> >wrote:
> >
> >> Hi Vinoth,
> >>
> >> I will try to reproduce the error on docker cluster and keep you
> updated.
> >>
> >> On Tue, Dec 24, 2019 at 11:23 PM Vinoth Chandar <vin...@apache.org>
> wrote:
> >>
> >>> Pratyaksh,
> >>>
> >>> If you are still having this issue, could you try reproducing this on
> the
> >>> docker setup
> >>>
> >>>
> https://hudi.apache.org/docker_demo.html#step-7--incremental-query-for-copy-on-write-table
> >>> similar to this and raise a JIRA.
> >>> Happy to look into it and get it fixed if needed
> >>>
> >>> Thanks
> >>> Vinoth
> >>>
> >>> On Tue, Dec 24, 2019 at 8:43 AM lamberken <lamber...@163.com> wrote:
> >>>
> >>> >
> >>> >
> >>> > Hi, @Pratyaksh Sharma
> >>> >
> >>> >
> >>> > The log4j-1.2.17.jar lib also needs to added to the classpath, for
> >>> example:
> >>> > java -cp
> >>> >
> >>>
> /path/to/hive-jdbc-2.3.1.jar:/path/to/log4j-1.2.17.jar:packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-0.5.1-SNAPSHOT.jar
> >>> > org.apache.hudi.utilities.HiveIncrementalPuller --help
> >>> >
> >>> >
> >>> > best,
> >>> > lamber-ken
> >>> >
> >>> > At 2019-12-24 17:23:20, "Pratyaksh Sharma" <pratyaks...@gmail.com>
> >>> wrote:
> >>> > >Hi Vinoth,
> >>> > >
> >>> > >Sorry my bad, I did not realise earlier that spark is not needed for
> >>> this
> >>> > >class. I tried running it with the below command to get the
> mentioned
> >>> > >exception -
> >>> > >
> >>> > >Command -
> >>> > >
> >>> > >java -cp
> >>> >
> >>> >
> >>>
> >/path/to/hive-jdbc-2.3.1.jar:packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-0.5.1-SNAPSHOT.jar
> >>> > >org.apache.hudi.utilities.HiveIncrementalPuller --help
> >>> > >
> >>> > >Exception -
> >>> > >Exception in thread "main" java.lang.NoClassDefFoundError:
> >>> > >org/apache/log4j/LogManager
> >>> > >        at
> >>> >
> >>> >
> >>>
> >org.apache.hudi.utilities.HiveIncrementalPuller.<clinit>(HiveIncrementalPuller.java:64)
> >>> > >Caused by: java.lang.ClassNotFoundException:
> >>> org.apache.log4j.LogManager
> >>> > >        at
> java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> >>> > >        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> >>> > >        at
> >>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
> >>> > >        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> >>> > >        ... 1 more
> >>> > >
> >>> > >I was able to fix it by including the corresponding jar in the
> bundle.
> >>> > >
> >>> > >After fixing the above, still I am getting the NPE even though the
> >>> > template
> >>> > >is bundled in the jar.
> >>> > >
> >>> > >On Mon, Dec 23, 2019 at 10:45 PM Vinoth Chandar <vin...@apache.org>
> >>> > wrote:
> >>> > >
> >>> > >> Hi Pratyaksh,
> >>> > >>
> >>> > >> HveIncrementalPuller is just a java program. Does not need Spark,
> >>> since
> >>> > it
> >>> > >> just runs a HiveQL remotely..
> >>> > >>
> >>> > >> On the error you specified, seems like it can't find the template?
> >>> Can
> >>> > you
> >>> > >> see if the bundle does not have the template file.. May be this
> got
> >>> > broken
> >>> > >> during the bundling changes.. (since its no longer part of the
> >>> resources
> >>> > >> folder of the bundle module).. We should also probably be
> throwing a
> >>> > better
> >>> > >> error than NPE..
> >>> > >>
> >>> > >> We can raise a JIRA, once you confirm.
> >>> > >>
> >>> > >> String templateContent =
> >>> > >>
> >>> > >>
> >>> >
> >>>
> FileIOUtils.readAsUTFString(this.getClass().getResourceAsStream("IncrementalPull.sqltemplate"));
> >>> > >>
> >>> > >>
> >>> > >> On Mon, Dec 23, 2019 at 6:02 AM Pratyaksh Sharma <
> >>> pratyaks...@gmail.com
> >>> > >
> >>> > >> wrote:
> >>> > >>
> >>> > >> > Hi,
> >>> > >> >
> >>> > >> > Can someone guide me or share some documentation regarding how
> to
> >>> use
> >>> > >> > HiveIncrementalPuller. I already went through the documentation
> on
> >>> > >> > https://hudi.apache.org/querying_data.html. I tried using this
> >>> puller
> >>> > >> > using
> >>> > >> > the below command and facing the given exception.
> >>> > >> >
> >>> > >> > Any leads are appreciated.
> >>> > >> >
> >>> > >> > Command -
> >>> > >> > spark-submit --name incremental-puller --queue etl --files
> >>> > >> > incremental_sql.txt --master yarn --deploy-mode cluster
> >>> > --driver-memory
> >>> > >> 4g
> >>> > >> > --executor-memory 4g --num-executors 2 --class
> >>> > >> > org.apache.hudi.utilities.HiveIncrementalPuller
> >>> > >> > hudi-utilities-bundle-0.5.1-SNAPSHOT.jar --hiveUrl
> >>> > >> > jdbc:hive2://HOST:PORT/ --hiveUser <user> --hivePass <pass>
> >>> > >> > --extractSQLFile incremental_sql.txt --sourceDb <source_db>
> >>> > --sourceTable
> >>> > >> > <src_table> --targetDb tmp --targetTable tempTable
> >>> --fromCommitTime 0
> >>> > >> > --maxCommits 1
> >>> > >> >
> >>> > >> > Error -
> >>> > >> >
> >>> > >> > java.lang.NullPointerException
> >>> > >> > at
> >>> org.apache.hudi.common.util.FileIOUtils.copy(FileIOUtils.java:73)
> >>> > >> > at
> >>> > >> >
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> org.apache.hudi.common.util.FileIOUtils.readAsUTFString(FileIOUtils.java:66)
> >>> > >> > at
> >>> > >> >
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> org.apache.hudi.common.util.FileIOUtils.readAsUTFString(FileIOUtils.java:61)
> >>> > >> > at
> >>> > >> >
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> org.apache.hudi.utilities.HiveIncrementalPuller.<init>(HiveIncrementalPuller.java:113)
> >>> > >> > at
> >>> > >> >
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> org.apache.hudi.utilities.HiveIncrementalPuller.main(HiveIncrementalPuller.java:343)
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> >>
>

Reply via email to