Thank Kabeer for helping me with doc, I am currently looking into it and
trying multiple options, if I came across anything, will keep you posted.

@Vinoth
As suggested, I have whitelisted the parquet dependency in the above
mention pom file, still got the same error message.
As of now, I am directly using Hudi-Utilities jar, there is no
separate project or my own project.
Also, with respect to Hive, I am aware of the same, as of now I have share
the Hive URL of 1.x which is used by CDH in config.

Again attaching the logs for reference.

Regards,
*Shahida R. Khan*


On Tue, Oct 15, 2019 at 9:22 PM Vinoth Chandar <[email protected]> wrote:

> > I have added both the dependency and tried too.
> If you are trying to get the hudi-utilities bundle to include a jar, then
> you also need to whitelist it explicitly here
>
> https://github.com/apache/incubator-hudi/blob/master/packaging/hudi-utilities-bundle/pom.xml#L67
>
>
> Heads up : you may hit issues with Hive since CDH hive is still 1.x (world
> is moving to Hive 3+ slowly, all other cloud/distro vendors are on Hive
> 2.x).
>
> On Tue, Oct 15, 2019 at 8:33 AM Kabeer Ahmed <[email protected]> wrote:
>
> > Shahida
> >
> > Thanks for trying out various options. I do not work on the CDH platform.
> > So I am hoping that someone with Cloudera platform will help you with the
> > jar. If I remember right there are CDH jars that are deployed under
> > /etc/cloudera/cdh/version and you should be able to find the jar path to
> > include on your classpath. I can also see that this file is referenced in
> > cloudera documentation at:
> >
> https://docs.cloudera.com/documentation/enterprise/5-14-x/topics/cdh_ig_parquet.html
> > (
> >
> https://link.getmailspring.com/link/[email protected]/0?redirect=https%3A%2F%2Fdocs.cloudera.com%2Fdocumentation%2Fenterprise%2F5-14-x%2Ftopics%2Fcdh_ig_parquet.html&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D
> )
> > (this is the same CDH 5.14 that you are using). Please try to search for
> > the jar with the name: parquet-hadoop-1.8.*.jar.
> > On another thread (https://github.com/bigdatagenomics/adam/issues/1742 (
> >
> https://link.getmailspring.com/link/[email protected]/1?redirect=https%3A%2F%2Fgithub.com%2Fbigdatagenomics%2Fadam%2Fissues%2F1742&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D
> ))
> > I also see an option of using: --packages
> > org.apache.parquet:parquet-hadoop:1.8.2.
> > If nothing works then we can use the brute force method of downloading
> the
> > jar manually from the link below and placing it on the classpath.
> >
> https://mvnrepository.com/artifact/org.apache.parquet/parquet-hadoop/1.8.3
> > (
> >
> https://link.getmailspring.com/link/[email protected]/2?redirect=https%3A%2F%2Fmvnrepository.com%2Fartifact%2Forg.apache.parquet%2Fparquet-hadoop%2F1.8.3&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D
> > )
> >
> > Can you please try these options and revert back with your observations?
> > I do understand why parquet-hadoop dependency in the pom.xml didnt do the
> > magic because it seems hudi code doesnt have a real dependency on this. I
> > still need to see the dependency tree if this jar is included through
> > parquet-avro.
> > There are loads of users from companies like Uber on this thread who use
> > hudi on CDH. Someone must have a solution for your issue.
> > On Oct 15 2019, at 2:51 pm, Shahida Khan <[email protected]
> .INVALID>
> > wrote:
> > > Hi Kabeer,
> > >
> > > I have added both the dependency and tried too.
> > > Just a version change, I have used *parquet-hadoop 1.8.1 *since
> > *parquet-avro
> > > *is* 1.8.1.*
> > > *Looks like this *
> > >
> > > * <parquet.version>1.8.1</parquet.version>*
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
> *<dependency><groupId>org.apache.parquet</groupId><artifactId>parquet-avro</artifactId><version>${parquet.version}</version><!--
> > > <scope>provided</scope>
> > >
> >
> --></dependency><dependency><groupId>org.apache.parquet</groupId><artifactId>parquet-hadoop</artifactId><version>
> > > ${parquet.version} </version></dependency> *
> > >
> > >
> > > Regards,
> > > *Shahida R. Khan*
> > >
> > >
> > > On Tue, Oct 15, 2019 at 7:12 PM Kabeer Ahmed <[email protected]>
> > wrote:
> > > > Thank you Shahida. Can you please confirm that you have included both
> > the
> > > > below dependencies and tried the build?
> > > >
> > > > If your build is missing parquet-hadoop, then the required class may
> > not
> > > > be found. If you have already included the below dependencies and
> > still it
> > > > doesnt work, I can upload a jar for you to try.
> > > > <dependency>
> > > > <groupId>org.apache.parquet</groupId>
> > > > <artifactId>parquet-avro</artifactId>
> > > > <version>${parquet.version}</version>
> > > > <scope>provided</scope>
> > > > </dependency>
> > > >
> > > > <dependency>
> > > > <groupId>org.apache.parquet</groupId>
> > > > <artifactId>parquet-hadoop</artifactId>
> > > > <version>1.8.3</version>
> > > > </dependency>
> > > > On Oct 15 2019, at 2:28 pm, Shahida Khan <
> [email protected]
> > .INVALID>
> > > > wrote:
> > > > > Hi Kabeer,
> > > > >
> > > > > Thank you for quick response!
> > > > > Also, our project already include the below dependency, I believe
> > this
> > > >
> > > > should include "org.apache.parquet.parquet-hadoop"
> > > > >
> > > > >
> > > > > <dependency>
> > > > > <groupId>org.apache.parquet</groupId>
> > > > > <artifactId>parquet-avro</artifactId>
> > > > > <version>${parquet.version}</version>
> > > > > </dependency>
> > > > >
> > > > >
> > > > > I have even checked the ```jar -tvf shahida.jar | grep -i
> > > > CompressionCodecName``` class is not available in the jar even after
> > > > including in build.
> > > > >
> > > > > Strange is, I have even provided the parquet-avro jar via
> > spark-submit,
> > > > and it behave differently for 1.7 and 1.8
> > > > > Seems like there is some configuration missing with respect to
> > > >
> > > > HoodieStorageConfig.PARQUET_COMPRESSION_CODEC.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Regards,
> > > > > Shahida R. Khan
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Oct 15, 2019 at 4:24 PM Kabeer Ahmed <[email protected]
> > > > (mailto:[email protected])> wrote:
> > > > > > Shahida
> > > > > >
> > > > > > Welcome to Hudi. I am not an expert with DeltaStreamer as I do
> not
> > use
> > > > it. In general, I think this points to the issue with build of the
> fat
> > jar.
> > > > This looks to me that either you didnt build the fat jar to include
> > all the
> > > > dependencies or your class path didnt include the jar needed.
> > > > > > For some reason I didnt receive the full stack trace attachment.
> > > > >
> > > >
> > > > Either you forgot to attach it or mail system blocked it.
> > > > > > Can you please check:
> > > > > > That your pom has dependency shown as below:
> > > > > > <!--
> > > > >
> > > >
> > > > https://mvnrepository.com/artifact/org.apache.parquet/parquet-hadoop
> > -->
> > > > > > <dependency>
> > > > > > <groupId>org.apache.parquet</groupId>
> > > > > > <artifactId>parquet-hadoop</artifactId>
> > > > > > <version>1.8.3</version>
> > > > > > </dependency>
> > > > > >
> > > > > > Can you also run ```jar -tvf shahida.jar | grep -i
> > > > CompressionCodecName ``` and let us know the output that you see.
> > > > > > Once we have the answers to the above, we can see what is missing
> > and
> > > > >
> > > >
> > > > address that hopefully.
> > > > > > Kabeer.
> > > > > > On Oct 15 2019, at 10:39 am, Shahida Khan <[email protected]
> > > > >
> > > >
> > > > (mailto:[email protected])> wrote:
> > > > > > > Hi All,
> > > > > > >
> > > > > > > Hope you are doing well.
> > > > > > > I am currently trying to implement the Hudi Utilities using
> Delta
> > > > > >
> > > > >
> > > >
> > > > Streamer. Below is the command line configuration I am passing
> > > > > > >
> > > > > > > spark2-submit --master yarn --deploy-mode cluster --class
> > > > org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer
> > > > /tmp/hudi-utilities-bundle-0.5.1-SNAPSHOT.jar --props
> > > > /user/oozie/dataops/hoodie/config.properties --schemaprovider-class
> > > > org.apache.hudi.utilities.schema.SchemaRegistryProvider
> --source-class
> > > > org.apache.hudi.utilities.sources.AvroKafkaSource
> > --source-ordering-field
> > > > LastModified_dtmStamp
> > > > > > > --target-base-path /tmp/hudi-deltastreamer-op_TEST
> --target-table
> > > > > >
> > > > >
> > > >
> > > > testTableHoodie --op UPSERT --enable-hive-sync --storage-type
> > MERGE_ON_READ
> > > > > > >
> > > > > > > Also, have attached the config file too.
> > > > > > > Unfortunately, while writing the files in parquet, it throws an
> > > > exception as "java.lang.NoClassDefFoundError:
> > > > org/apache/parquet/hadoop/metadata/CompressionCodecName"
> > > > > > > Full Error Trace has been attached for your reference.
> > > > > > >
> > > > > > > There are few warnings with respect to configuration but not
> > sure if
> > > > that's the problem.
> > > > > > >
> > > > > > > I have tried giving the classpath as well. I am not sure what i
> > am
> > > > missing here.
> > > > > > > It would be great if anybody could help me here.
> > > > > > >
> > > > > > > Hadoop version :- 2.6.0-cdh5.14.2
> > > > > > > Spark version :- 2.3.0.cloudera2
> > > > > > >
> > > > > > >
> > > > > > > Regards,
> > > > > > > Shahida R. Khan
> > > > > > > +91 9167538366
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > > The information contained in this transmission may contain
> privileged
> > > > and confidential information of Big Tree Entertainment Pvt Ltd,
> > including
> > > > information protected by privacy laws. It is intended only for the
> use
> > of
> > > > Big Tree Entertainment Pvt Ltd. If you are not the intended
> recipient,
> > you
> > > > are hereby notified that any review, dissemination, distribution, or
> > > > duplication of this communication is strictly prohibited. If you are
> > not
> > > > the intended recipient, please contact the sender by reply email and
> > > > destroy all copies of the original message. Although Big Tree
> > Entertainment
> > > > Pvt Ltd. has taken reasonable precautions to ensure no viruses are
> > present
> > > > in this email, Big Tree Entertainment Pvt Ltd. cannot accept
> > responsibility
> > > > for any loss or damage arising from the use of this email or
> > attachments.
> > > > Computer viruses can be transmitted via email. Recipient should check
> > the
> > > > email and any attachments for the presence of viruses before using
> > them.
> > > > Any views or opinions are solely those of th
> > > > e author and do not necessarily represent those of Big Tree
> > Entertainment
> > > > Pvt Ltd.
> > > >
> > >
> > > --
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > The information contained in this transmission may contain
> > > privileged and confidential information of Big Tree Entertainment Pvt
> > Ltd,
> > > including information protected by privacy laws. It is intended only
> for
> > > the use of Big Tree Entertainment Pvt Ltd. If you are not the intended
> > > recipient, you are hereby notified that any review, dissemination,
> > > distribution, or duplication of this communication is strictly
> > prohibited.
> > > If you are not the intended recipient, please contact the sender by
> reply
> > > email and destroy all copies of the original message. Although Big Tree
> > > Entertainment Pvt Ltd. has taken reasonable precautions to ensure no
> > > viruses are present in this email, Big Tree Entertainment Pvt Ltd.
> cannot
> > > accept responsibility for any loss or damage arising from the use of
> this
> > > email or attachments. Computer viruses can be transmitted via email.
> > > Recipient should check the email and any attachments for the presence
> of
> > > viruses before using them. Any views or opinions are solely those of
> the
> > > author and do not necessarily represent those of Big Tree Entertainment
> > Pvt
> > > Ltd.
> > >
> >
> >
>

-- 










The information contained in this transmission may contain 
privileged and confidential information of Big Tree Entertainment Pvt Ltd, 
including information protected by privacy laws. It is intended only for 
the use of Big Tree Entertainment Pvt Ltd. If you are not the intended 
recipient, you are hereby notified that any review, dissemination, 
distribution, or duplication of this communication is strictly prohibited. 
If you are not the intended recipient, please contact the sender by reply 
email and destroy all copies of the original message. Although Big Tree 
Entertainment Pvt Ltd. has taken reasonable precautions to ensure no 
viruses are present in this email, Big Tree Entertainment Pvt Ltd. cannot 
accept responsibility for any loss or damage arising from the use of this 
email or attachments. Computer viruses can be transmitted via email. 
Recipient should check the email and any attachments for the presence of 
viruses before using them. Any views or opinions are solely those of the 
author and do not necessarily represent those of Big Tree Entertainment Pvt 
Ltd.

Reply via email to