Hi Umesh,

This is on top of my list of the week. But If you already have input data
somewhere on s3/hdfs, nothing stops you from trying the DeltaStreamer tool
or writing a simple spark job depending on hoodie-spark. Whats your
eventual deployment strategy?

Thanks
Vinoth

On Mon, Apr 22, 2019 at 6:09 AM Umesh Kacha <[email protected]> wrote:

> Hi Vinoth can you please help with this I quickly want to try HoodieJavaApp
> it seems to be partially working in my local setup with some run time
> dependencies failure as mentioned in the previous email.
>
> On Sat, Apr 20, 2019, 10:18 AM Umesh Kacha <[email protected]> wrote:
>
> > Thanks Vinoth yes please that would be great HoodieJavaApp moved out of
> > test and working.
> >
> > On Sat, Apr 20, 2019, 6:09 AM Vinoth Chandar <
> > [email protected]> wrote:
> >
> >> Sorry.  Not following. If you are building your own spark job using
> hudi,
> >> then you just pull in hoodie-spark module
> >>
> >> http://hudi.apache.org/writing_data.html#datasource-writer
> >>
> >>
> >> Spark bundle can be used with —jars option on spark-shell etc to query
> the
> >> datasets.
> >>
> >> Does that help? Can you describe what you are trying to accomplish?
> >>
> >> Checking again, do you need a patch with the HoodieJavaApp moved out of
> >> tests and working?
> >>
> >> On Fri, Apr 19, 2019 at 12:01 PM Umesh Kacha <[email protected]>
> >> wrote:
> >>
> >> > Thanks Vinoth how do I know what all spark jars and their versions I
> was
> >> > expecting hoodie-spark-bundle-0.4.5.jar would do that since it's an
> uber
> >> > jar but it's not recently I found I had to add spark maven coordinates
> >> > separately in pom file. Anyways if you can give me list of jars I can
> >> put
> >> > in a classpath and run.
> >> >
> >> > On Fri, Apr 19, 2019, 11:40 PM Vinoth Chandar <[email protected]>
> >> wrote:
> >> >
> >> > > Looks like a class mismatch error on Hadoop jars.. Easiest way to do
> >> > this,
> >> > > is to pull the code into IntelliJ, add the spark jars folder to
> >> module's
> >> > > class path and then run the test by right clicking > run
> >> > >
> >> > > I can prep a patch for you if you'd like. lmk
> >> > >
> >> > > Thanks
> >> > > Vinoth
> >> > >
> >> > > On Thu, Apr 18, 2019 at 8:46 AM Umesh Kacha <[email protected]>
> >> > wrote:
> >> > >
> >> > > > Hi Vinoth, I could manage running HoodieJavaApp in my local maven
> >> > project
> >> > > > there I had to copy the following classes which were used by
> >> > > HoodieJavaApp.
> >> > > > Inside HoodieJavaTest main I am creating object of HoodieJavaApp
> >> which
> >> > > just
> >> > > > runs with all default options.
> >> > > >
> >> > > > [image: image.png]
> >> > > >
> >> > > > However I get the following error which seems like one of the run
> >> time
> >> > > > dependencies missing. Please guide.
> >> > > >
> >> > > > Exception in thread "main"
> >> > > > com.uber.hoodie.exception.HoodieUpsertException: Failed to upsert
> >> for
> >> > > > commit time 20190418210326
> >> > > > at
> >> com.uber.hoodie.HoodieWriteClient.upsert(HoodieWriteClient.java:175)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> com.uber.hoodie.DataSourceUtils.doWriteOperation(DataSourceUtils.java:153)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> com.uber.hoodie.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:149)
> >> > > > at
> >> com.uber.hoodie.DefaultSource.createRelation(DefaultSource.scala:91)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:426)
> >> > > > at
> >> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215)
> >> > > > at
> >> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:198)
> >> > > > at HoodieJavaApp.run(HoodieJavaApp.java:143)
> >> > > > at HoodieJavaApp.main(HoodieJavaApp.java:67)
> >> > > > Caused by: org.apache.spark.SparkException: Job aborted due to
> stage
> >> > > > failure: Task 0 in stage 27.0 failed 1 times, most recent failure:
> >> Lost
> >> > > > task 0.0 in stage 27.0 (TID 49, localhost, executor driver):
> >> > > > java.lang.RuntimeException:
> >> > > com.uber.hoodie.exception.HoodieIndexException:
> >> > > > Error checking bloom filter index.
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> com.uber.hoodie.func.LazyIterableIterator.next(LazyIterableIterator.java:121)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
> >> > > > at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> >> > > > at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> >> > > > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461)
> >> > > > at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
> >> > > > at org.apache.spark.scheduler.Task.run(Task.scala:99)
> >> > > > at
> >> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> >> > > > at java.lang.Thread.run(Thread.java:745)
> >> > > > Caused by: com.uber.hoodie.exception.HoodieIndexException: Error
> >> > checking
> >> > > > bloom filter index.
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:196)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:90)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> com.uber.hoodie.func.LazyIterableIterator.next(LazyIterableIterator.java:119)
> >> > > > ... 13 more
> >> > > > Caused by: java.lang.NoSuchMethodError:
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.conf.Configuration.addResource(Lorg/apache/hadoop/conf/Configuration;)V
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> com.uber.hoodie.common.util.ParquetUtils.filterParquetRowKeys(ParquetUtils.java:79)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction.checkCandidatesAgainstFile(HoodieBloomIndexCheckFunction.java:68)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:166)
> >> > > > ... 15 more
> >> > > >
> >> > > > Driver stacktrace:
> >> > > > at org.apache.spark.scheduler.DAGScheduler.org
> >> > > >
> >> > >
> >> >
> >>
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> >> > > > at
> >> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
> >> > > > at scala.Option.foreach(Option.scala:257)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)
> >> > > > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> >> > > > at
> >> > org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
> >> > > > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1918)
> >> > > > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1931)
> >> > > > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1944)
> >> > > > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1958)
> >> > > > at
> org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:935)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
> >> > > > at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
> >> > > > at org.apache.spark.rdd.RDD.collect(RDD.scala:934)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$countByKey$1.apply(PairRDDFunctions.scala:375)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$countByKey$1.apply(PairRDDFunctions.scala:375)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
> >> > > > at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> org.apache.spark.rdd.PairRDDFunctions.countByKey(PairRDDFunctions.scala:374)
> >> > > > at
> >> > >
> >> org.apache.spark.api.java.JavaPairRDD.countByKey(JavaPairRDD.scala:312)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> com.uber.hoodie.table.WorkloadProfile.buildProfile(WorkloadProfile.java:64)
> >> > > > at
> >> > com.uber.hoodie.table.WorkloadProfile.<init>(WorkloadProfile.java:56)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> com.uber.hoodie.HoodieWriteClient.upsertRecordsInternal(HoodieWriteClient.java:428)
> >> > > > at
> >> com.uber.hoodie.HoodieWriteClient.upsert(HoodieWriteClient.java:170)
> >> > > > ... 8 more
> >> > > > Caused by: java.lang.RuntimeException:
> >> > > > com.uber.hoodie.exception.HoodieIndexException: Error checking
> bloom
> >> > > filter
> >> > > > index.
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> com.uber.hoodie.func.LazyIterableIterator.next(LazyIterableIterator.java:121)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
> >> > > > at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> >> > > > at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> >> > > > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461)
> >> > > > at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
> >> > > > at org.apache.spark.scheduler.Task.run(Task.scala:99)
> >> > > > at
> >> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> >> > > > at java.lang.Thread.run(Thread.java:745)
> >> > > > Caused by: com.uber.hoodie.exception.HoodieIndexException: Error
> >> > checking
> >> > > > bloom filter index.
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:196)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:90)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> com.uber.hoodie.func.LazyIterableIterator.next(LazyIterableIterator.java:119)
> >> > > > ... 13 more
> >> > > > Caused by: java.lang.NoSuchMethodError:
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.conf.Configuration.addResource(Lorg/apache/hadoop/conf/Configuration;)V
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> com.uber.hoodie.common.util.ParquetUtils.filterParquetRowKeys(ParquetUtils.java:79)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction.checkCandidatesAgainstFile(HoodieBloomIndexCheckFunction.java:68)
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:166)
> >> > > > ... 15 more
> >> > > >
> >> > > > On Thu, Apr 18, 2019 at 7:53 PM Vinoth Chandar <[email protected]
> >
> >> > > wrote:
> >> > > >
> >> > > >> Hi Umesh,
> >> > > >>
> >> > > >> IIUC, your suggestion is without the need to checkout/build
> source
> >> > code,
> >> > > >> one should be able to run the sample app? That does seem fair to
> >> me.
> >> > We
> >> > > >> had to move test data generator out of tests to place this under
> >> > source
> >> > > >> code.
> >> > > >>
> >> > > >> I am hoping something like hoodie-bench could be a more
> >> comprehensive
> >> > > >> replacement for this mid term.
> >> > > >> https://github.com/apache/incubator-hudi/pull/623/files
> Thoughts?
> >> > > >>
> >> > > >> But, in the short term, let us know if it becomes too cumbersome
> >> for
> >> > you
> >> > > >> to
> >> > > >> try out HoodieJavaApp.
> >> > > >>
> >> > > >> Thanks
> >> > > >> Vinoth
> >> > > >>
> >> > > >> On Thu, Apr 18, 2019 at 6:00 AM Umesh Kacha <
> [email protected]
> >> >
> >> > > >> wrote:
> >> > > >>
> >> > > >> > I can see there is a todo do what I suggested,
> >> > > >> >
> >> > > >> > #TODO - Need to move TestDataGenerator and HoodieJavaApp out of
> >> > tests
> >> > > >> >
> >> > > >> > On Thu, Apr 18, 2019 at 2:23 PM Umesh Kacha <
> >> [email protected]>
> >> > > >> wrote:
> >> > > >> >
> >> > > >> > > Ok this useful class should have been part of utility and
> >> should
> >> > be
> >> > > >> able
> >> > > >> > > to run out of the box as IMHO developer need not necessarily
> >> build
> >> > > >> > project.
> >> > > >> > > I tried to create a maven project where I kept
> >> hoodie-spark-bundle
> >> > > as
> >> > > >> > > dependency and copied HoodieJavaApp and DataSourceTestUtils
> >> class
> >> > > >> into it
> >> > > >> > > but it does not compile. I have bee told here that
> >> > > >> hoodie-spark-bundle is
> >> > > >> > > uber jar but I doubt it is.
> >> > > >> > >
> >> > > >> > > On Thu, Apr 18, 2019 at 1:44 PM Jing Chen <
> >> [email protected]>
> >> > > >> wrote:
> >> > > >> > >
> >> > > >> > >> Hi Umesh,
> >> > > >> > >> I believe *HoodieJavaApp *is a test class under
> >> *hoodie-spark.*
> >> > > >> > >> AFAIK, test classes are not supposed to be included in the
> >> > > artifact.
> >> > > >> > >> However, if you want to build an artifact where you have
> >> access
> >> > to
> >> > > >> test
> >> > > >> > >> classes, you would build from source code.
> >> > > >> > >> Once you build the hoodie project, you are able to find a
> test
> >> > jar
> >> > > >> that
> >> > > >> > >> includes *HoodieJavaApp *under
> >> > > >> > >>
> *hoodie-spark/target/hoodie-spark-0.4.5-SNAPSHOT-tests.jar**.*
> >> > > >> > >>
> >> > > >> > >> Thanks
> >> > > >> > >> Jing
> >> > > >> > >>
> >> > > >> > >> On Wed, Apr 17, 2019 at 11:10 PM Umesh Kacha <
> >> > > [email protected]>
> >> > > >> > >> wrote:
> >> > > >> > >>
> >> > > >> > >> > Hi I am not able to import class HoodieJavaApp using any
> of
> >> the
> >> > > >> maven
> >> > > >> > >> jars.
> >> > > >> > >> > I tried hooodie-spark-bundle and hoodie-spark both. It
> >> simply
> >> > > does
> >> > > >> not
> >> > > >> > >> find
> >> > > >> > >> > this class. I am using 0.4.5. Please guide.
> >> > > >> > >> >
> >> > > >> > >> > Regards,
> >> > > >> > >> > Umesh
> >> > > >> > >> >
> >> > > >> > >>
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > > >
> >> > >
> >> >
> >>
> >
>

Reply via email to