Hi Vinoth can you please help with this I quickly want to try HoodieJavaApp
it seems to be partially working in my local setup with some run time
dependencies failure as mentioned in the previous email.

On Sat, Apr 20, 2019, 10:18 AM Umesh Kacha <[email protected]> wrote:

> Thanks Vinoth yes please that would be great HoodieJavaApp moved out of
> test and working.
>
> On Sat, Apr 20, 2019, 6:09 AM Vinoth Chandar <
> [email protected]> wrote:
>
>> Sorry.  Not following. If you are building your own spark job using hudi,
>> then you just pull in hoodie-spark module
>>
>> http://hudi.apache.org/writing_data.html#datasource-writer
>>
>>
>> Spark bundle can be used with —jars option on spark-shell etc to query the
>> datasets.
>>
>> Does that help? Can you describe what you are trying to accomplish?
>>
>> Checking again, do you need a patch with the HoodieJavaApp moved out of
>> tests and working?
>>
>> On Fri, Apr 19, 2019 at 12:01 PM Umesh Kacha <[email protected]>
>> wrote:
>>
>> > Thanks Vinoth how do I know what all spark jars and their versions I was
>> > expecting hoodie-spark-bundle-0.4.5.jar would do that since it's an uber
>> > jar but it's not recently I found I had to add spark maven coordinates
>> > separately in pom file. Anyways if you can give me list of jars I can
>> put
>> > in a classpath and run.
>> >
>> > On Fri, Apr 19, 2019, 11:40 PM Vinoth Chandar <[email protected]>
>> wrote:
>> >
>> > > Looks like a class mismatch error on Hadoop jars.. Easiest way to do
>> > this,
>> > > is to pull the code into IntelliJ, add the spark jars folder to
>> module's
>> > > class path and then run the test by right clicking > run
>> > >
>> > > I can prep a patch for you if you'd like. lmk
>> > >
>> > > Thanks
>> > > Vinoth
>> > >
>> > > On Thu, Apr 18, 2019 at 8:46 AM Umesh Kacha <[email protected]>
>> > wrote:
>> > >
>> > > > Hi Vinoth, I could manage running HoodieJavaApp in my local maven
>> > project
>> > > > there I had to copy the following classes which were used by
>> > > HoodieJavaApp.
>> > > > Inside HoodieJavaTest main I am creating object of HoodieJavaApp
>> which
>> > > just
>> > > > runs with all default options.
>> > > >
>> > > > [image: image.png]
>> > > >
>> > > > However I get the following error which seems like one of the run
>> time
>> > > > dependencies missing. Please guide.
>> > > >
>> > > > Exception in thread "main"
>> > > > com.uber.hoodie.exception.HoodieUpsertException: Failed to upsert
>> for
>> > > > commit time 20190418210326
>> > > > at
>> com.uber.hoodie.HoodieWriteClient.upsert(HoodieWriteClient.java:175)
>> > > > at
>> > > >
>> > >
>> >
>> com.uber.hoodie.DataSourceUtils.doWriteOperation(DataSourceUtils.java:153)
>> > > > at
>> > > >
>> > >
>> >
>> com.uber.hoodie.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:149)
>> > > > at
>> com.uber.hoodie.DefaultSource.createRelation(DefaultSource.scala:91)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:426)
>> > > > at
>> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215)
>> > > > at
>> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:198)
>> > > > at HoodieJavaApp.run(HoodieJavaApp.java:143)
>> > > > at HoodieJavaApp.main(HoodieJavaApp.java:67)
>> > > > Caused by: org.apache.spark.SparkException: Job aborted due to stage
>> > > > failure: Task 0 in stage 27.0 failed 1 times, most recent failure:
>> Lost
>> > > > task 0.0 in stage 27.0 (TID 49, localhost, executor driver):
>> > > > java.lang.RuntimeException:
>> > > com.uber.hoodie.exception.HoodieIndexException:
>> > > > Error checking bloom filter index.
>> > > > at
>> > > >
>> > >
>> >
>> com.uber.hoodie.func.LazyIterableIterator.next(LazyIterableIterator.java:121)
>> > > > at
>> > > >
>> > >
>> >
>> scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
>> > > > at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
>> > > > at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
>> > > > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461)
>> > > > at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
>> > > > at org.apache.spark.scheduler.Task.run(Task.scala:99)
>> > > > at
>> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
>> > > > at
>> > > >
>> > >
>> >
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> > > > at
>> > > >
>> > >
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> > > > at java.lang.Thread.run(Thread.java:745)
>> > > > Caused by: com.uber.hoodie.exception.HoodieIndexException: Error
>> > checking
>> > > > bloom filter index.
>> > > > at
>> > > >
>> > >
>> >
>> com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:196)
>> > > > at
>> > > >
>> > >
>> >
>> com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:90)
>> > > > at
>> > > >
>> > >
>> >
>> com.uber.hoodie.func.LazyIterableIterator.next(LazyIterableIterator.java:119)
>> > > > ... 13 more
>> > > > Caused by: java.lang.NoSuchMethodError:
>> > > >
>> > >
>> >
>> org.apache.hadoop.conf.Configuration.addResource(Lorg/apache/hadoop/conf/Configuration;)V
>> > > > at
>> > > >
>> > >
>> >
>> com.uber.hoodie.common.util.ParquetUtils.filterParquetRowKeys(ParquetUtils.java:79)
>> > > > at
>> > > >
>> > >
>> >
>> com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction.checkCandidatesAgainstFile(HoodieBloomIndexCheckFunction.java:68)
>> > > > at
>> > > >
>> > >
>> >
>> com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:166)
>> > > > ... 15 more
>> > > >
>> > > > Driver stacktrace:
>> > > > at org.apache.spark.scheduler.DAGScheduler.org
>> > > >
>> > >
>> >
>> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422)
>> > > > at
>> > > >
>> > >
>> >
>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>> > > > at
>> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
>> > > > at scala.Option.foreach(Option.scala:257)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)
>> > > > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>> > > > at
>> > org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
>> > > > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1918)
>> > > > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1931)
>> > > > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1944)
>> > > > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1958)
>> > > > at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:935)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>> > > > at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
>> > > > at org.apache.spark.rdd.RDD.collect(RDD.scala:934)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$countByKey$1.apply(PairRDDFunctions.scala:375)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$countByKey$1.apply(PairRDDFunctions.scala:375)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>> > > > at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.spark.rdd.PairRDDFunctions.countByKey(PairRDDFunctions.scala:374)
>> > > > at
>> > >
>> org.apache.spark.api.java.JavaPairRDD.countByKey(JavaPairRDD.scala:312)
>> > > > at
>> > > >
>> > >
>> >
>> com.uber.hoodie.table.WorkloadProfile.buildProfile(WorkloadProfile.java:64)
>> > > > at
>> > com.uber.hoodie.table.WorkloadProfile.<init>(WorkloadProfile.java:56)
>> > > > at
>> > > >
>> > >
>> >
>> com.uber.hoodie.HoodieWriteClient.upsertRecordsInternal(HoodieWriteClient.java:428)
>> > > > at
>> com.uber.hoodie.HoodieWriteClient.upsert(HoodieWriteClient.java:170)
>> > > > ... 8 more
>> > > > Caused by: java.lang.RuntimeException:
>> > > > com.uber.hoodie.exception.HoodieIndexException: Error checking bloom
>> > > filter
>> > > > index.
>> > > > at
>> > > >
>> > >
>> >
>> com.uber.hoodie.func.LazyIterableIterator.next(LazyIterableIterator.java:121)
>> > > > at
>> > > >
>> > >
>> >
>> scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
>> > > > at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
>> > > > at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
>> > > > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461)
>> > > > at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
>> > > > at org.apache.spark.scheduler.Task.run(Task.scala:99)
>> > > > at
>> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
>> > > > at
>> > > >
>> > >
>> >
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> > > > at
>> > > >
>> > >
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> > > > at java.lang.Thread.run(Thread.java:745)
>> > > > Caused by: com.uber.hoodie.exception.HoodieIndexException: Error
>> > checking
>> > > > bloom filter index.
>> > > > at
>> > > >
>> > >
>> >
>> com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:196)
>> > > > at
>> > > >
>> > >
>> >
>> com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:90)
>> > > > at
>> > > >
>> > >
>> >
>> com.uber.hoodie.func.LazyIterableIterator.next(LazyIterableIterator.java:119)
>> > > > ... 13 more
>> > > > Caused by: java.lang.NoSuchMethodError:
>> > > >
>> > >
>> >
>> org.apache.hadoop.conf.Configuration.addResource(Lorg/apache/hadoop/conf/Configuration;)V
>> > > > at
>> > > >
>> > >
>> >
>> com.uber.hoodie.common.util.ParquetUtils.filterParquetRowKeys(ParquetUtils.java:79)
>> > > > at
>> > > >
>> > >
>> >
>> com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction.checkCandidatesAgainstFile(HoodieBloomIndexCheckFunction.java:68)
>> > > > at
>> > > >
>> > >
>> >
>> com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:166)
>> > > > ... 15 more
>> > > >
>> > > > On Thu, Apr 18, 2019 at 7:53 PM Vinoth Chandar <[email protected]>
>> > > wrote:
>> > > >
>> > > >> Hi Umesh,
>> > > >>
>> > > >> IIUC, your suggestion is without the need to checkout/build source
>> > code,
>> > > >> one should be able to run the sample app? That does seem fair to
>> me.
>> > We
>> > > >> had to move test data generator out of tests to place this under
>> > source
>> > > >> code.
>> > > >>
>> > > >> I am hoping something like hoodie-bench could be a more
>> comprehensive
>> > > >> replacement for this mid term.
>> > > >> https://github.com/apache/incubator-hudi/pull/623/files  Thoughts?
>> > > >>
>> > > >> But, in the short term, let us know if it becomes too cumbersome
>> for
>> > you
>> > > >> to
>> > > >> try out HoodieJavaApp.
>> > > >>
>> > > >> Thanks
>> > > >> Vinoth
>> > > >>
>> > > >> On Thu, Apr 18, 2019 at 6:00 AM Umesh Kacha <[email protected]
>> >
>> > > >> wrote:
>> > > >>
>> > > >> > I can see there is a todo do what I suggested,
>> > > >> >
>> > > >> > #TODO - Need to move TestDataGenerator and HoodieJavaApp out of
>> > tests
>> > > >> >
>> > > >> > On Thu, Apr 18, 2019 at 2:23 PM Umesh Kacha <
>> [email protected]>
>> > > >> wrote:
>> > > >> >
>> > > >> > > Ok this useful class should have been part of utility and
>> should
>> > be
>> > > >> able
>> > > >> > > to run out of the box as IMHO developer need not necessarily
>> build
>> > > >> > project.
>> > > >> > > I tried to create a maven project where I kept
>> hoodie-spark-bundle
>> > > as
>> > > >> > > dependency and copied HoodieJavaApp and DataSourceTestUtils
>> class
>> > > >> into it
>> > > >> > > but it does not compile. I have bee told here that
>> > > >> hoodie-spark-bundle is
>> > > >> > > uber jar but I doubt it is.
>> > > >> > >
>> > > >> > > On Thu, Apr 18, 2019 at 1:44 PM Jing Chen <
>> [email protected]>
>> > > >> wrote:
>> > > >> > >
>> > > >> > >> Hi Umesh,
>> > > >> > >> I believe *HoodieJavaApp *is a test class under
>> *hoodie-spark.*
>> > > >> > >> AFAIK, test classes are not supposed to be included in the
>> > > artifact.
>> > > >> > >> However, if you want to build an artifact where you have
>> access
>> > to
>> > > >> test
>> > > >> > >> classes, you would build from source code.
>> > > >> > >> Once you build the hoodie project, you are able to find a test
>> > jar
>> > > >> that
>> > > >> > >> includes *HoodieJavaApp *under
>> > > >> > >> *hoodie-spark/target/hoodie-spark-0.4.5-SNAPSHOT-tests.jar**.*
>> > > >> > >>
>> > > >> > >> Thanks
>> > > >> > >> Jing
>> > > >> > >>
>> > > >> > >> On Wed, Apr 17, 2019 at 11:10 PM Umesh Kacha <
>> > > [email protected]>
>> > > >> > >> wrote:
>> > > >> > >>
>> > > >> > >> > Hi I am not able to import class HoodieJavaApp using any of
>> the
>> > > >> maven
>> > > >> > >> jars.
>> > > >> > >> > I tried hooodie-spark-bundle and hoodie-spark both. It
>> simply
>> > > does
>> > > >> not
>> > > >> > >> find
>> > > >> > >> > this class. I am using 0.4.5. Please guide.
>> > > >> > >> >
>> > > >> > >> > Regards,
>> > > >> > >> > Umesh
>> > > >> > >> >
>> > > >> > >>
>> > > >> > >
>> > > >> >
>> > > >>
>> > > >
>> > >
>> >
>>
>

Reply via email to