Hi Vinoth, I could manage running HoodieJavaApp in my local maven project there I had to copy the following classes which were used by HoodieJavaApp. Inside HoodieJavaTest main I am creating object of HoodieJavaApp which just runs with all default options.
[image: image.png] However I get the following error which seems like one of the run time dependencies missing. Please guide. Exception in thread "main" com.uber.hoodie.exception.HoodieUpsertException: Failed to upsert for commit time 20190418210326 at com.uber.hoodie.HoodieWriteClient.upsert(HoodieWriteClient.java:175) at com.uber.hoodie.DataSourceUtils.doWriteOperation(DataSourceUtils.java:153) at com.uber.hoodie.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:149) at com.uber.hoodie.DefaultSource.createRelation(DefaultSource.scala:91) at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:426) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:198) at HoodieJavaApp.run(HoodieJavaApp.java:143) at HoodieJavaApp.main(HoodieJavaApp.java:67) Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 27.0 failed 1 times, most recent failure: Lost task 0.0 in stage 27.0 (TID 49, localhost, executor driver): java.lang.RuntimeException: com.uber.hoodie.exception.HoodieIndexException: Error checking bloom filter index. at com.uber.hoodie.func.LazyIterableIterator.next(LazyIterableIterator.java:121) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43) at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: com.uber.hoodie.exception.HoodieIndexException: Error checking bloom filter index. at com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:196) at com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:90) at com.uber.hoodie.func.LazyIterableIterator.next(LazyIterableIterator.java:119) ... 13 more Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.conf.Configuration.addResource(Lorg/apache/hadoop/conf/Configuration;)V at com.uber.hoodie.common.util.ParquetUtils.filterParquetRowKeys(ParquetUtils.java:79) at com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction.checkCandidatesAgainstFile(HoodieBloomIndexCheckFunction.java:68) at com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:166) ... 15 more Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1918) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1931) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1944) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1958) at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:935) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) at org.apache.spark.rdd.RDD.collect(RDD.scala:934) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$countByKey$1.apply(PairRDDFunctions.scala:375) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$countByKey$1.apply(PairRDDFunctions.scala:375) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) at org.apache.spark.rdd.PairRDDFunctions.countByKey(PairRDDFunctions.scala:374) at org.apache.spark.api.java.JavaPairRDD.countByKey(JavaPairRDD.scala:312) at com.uber.hoodie.table.WorkloadProfile.buildProfile(WorkloadProfile.java:64) at com.uber.hoodie.table.WorkloadProfile.<init>(WorkloadProfile.java:56) at com.uber.hoodie.HoodieWriteClient.upsertRecordsInternal(HoodieWriteClient.java:428) at com.uber.hoodie.HoodieWriteClient.upsert(HoodieWriteClient.java:170) ... 8 more Caused by: java.lang.RuntimeException: com.uber.hoodie.exception.HoodieIndexException: Error checking bloom filter index. at com.uber.hoodie.func.LazyIterableIterator.next(LazyIterableIterator.java:121) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43) at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: com.uber.hoodie.exception.HoodieIndexException: Error checking bloom filter index. at com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:196) at com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:90) at com.uber.hoodie.func.LazyIterableIterator.next(LazyIterableIterator.java:119) ... 13 more Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.conf.Configuration.addResource(Lorg/apache/hadoop/conf/Configuration;)V at com.uber.hoodie.common.util.ParquetUtils.filterParquetRowKeys(ParquetUtils.java:79) at com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction.checkCandidatesAgainstFile(HoodieBloomIndexCheckFunction.java:68) at com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:166) ... 15 more On Thu, Apr 18, 2019 at 7:53 PM Vinoth Chandar <[email protected]> wrote: > Hi Umesh, > > IIUC, your suggestion is without the need to checkout/build source code, > one should be able to run the sample app? That does seem fair to me. We > had to move test data generator out of tests to place this under source > code. > > I am hoping something like hoodie-bench could be a more comprehensive > replacement for this mid term. > https://github.com/apache/incubator-hudi/pull/623/files Thoughts? > > But, in the short term, let us know if it becomes too cumbersome for you to > try out HoodieJavaApp. > > Thanks > Vinoth > > On Thu, Apr 18, 2019 at 6:00 AM Umesh Kacha <[email protected]> wrote: > > > I can see there is a todo do what I suggested, > > > > #TODO - Need to move TestDataGenerator and HoodieJavaApp out of tests > > > > On Thu, Apr 18, 2019 at 2:23 PM Umesh Kacha <[email protected]> > wrote: > > > > > Ok this useful class should have been part of utility and should be > able > > > to run out of the box as IMHO developer need not necessarily build > > project. > > > I tried to create a maven project where I kept hoodie-spark-bundle as > > > dependency and copied HoodieJavaApp and DataSourceTestUtils class into > it > > > but it does not compile. I have bee told here that hoodie-spark-bundle > is > > > uber jar but I doubt it is. > > > > > > On Thu, Apr 18, 2019 at 1:44 PM Jing Chen <[email protected]> > wrote: > > > > > >> Hi Umesh, > > >> I believe *HoodieJavaApp *is a test class under *hoodie-spark.* > > >> AFAIK, test classes are not supposed to be included in the artifact. > > >> However, if you want to build an artifact where you have access to > test > > >> classes, you would build from source code. > > >> Once you build the hoodie project, you are able to find a test jar > that > > >> includes *HoodieJavaApp *under > > >> *hoodie-spark/target/hoodie-spark-0.4.5-SNAPSHOT-tests.jar**.* > > >> > > >> Thanks > > >> Jing > > >> > > >> On Wed, Apr 17, 2019 at 11:10 PM Umesh Kacha <[email protected]> > > >> wrote: > > >> > > >> > Hi I am not able to import class HoodieJavaApp using any of the > maven > > >> jars. > > >> > I tried hooodie-spark-bundle and hoodie-spark both. It simply does > not > > >> find > > >> > this class. I am using 0.4.5. Please guide. > > >> > > > >> > Regards, > > >> > Umesh > > >> > > > >> > > > > > >
