Re: HoodieMultiTableDeltaStreamer failing due to missing file path delimiter

2021-10-12 Thread Philip Ittmann
Hi Pratyaksh,

Thank you very much for your help. After reading through your advice, the
workaround I came to was setting `--config-folder s3:\/\/{{
s3_config_bucket }}\/\/` (note the double forward slash at the end). This
circumvents the slash being deleted at the end of the `--config-folder`
parameter [1]. I am still unsure as to what the root cause of the problem
is, but I can move forward with my work using the work around. Thank you
again for your assistance!

Best wishes,
Philip

[1]
https://github.com/apache/hudi/blob/da65d3cae99e8fee0ede9b5ed8630a3716d284c8/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieMultiTableDeltaStreamer.java#L345

On Fri, Oct 8, 2021 at 4:43 PM Pratyaksh Sharma 
wrote:

> Hi Philip,
>
> I checked the configs that you are passing and it all looks good. Indeed
> the problem is the absence of forward slash which should not happen in
> general. Can you try printing the configs once and see if the configFile
> path is getting passed properly?
>
> Also as a workaround, you can create `s3://{{ bucket_name }}/{{
> hive_database }}_{{ hive_table }}_config.properties` as the properties file
> for table overridden properties and not mention the property
> `hoodie.deltastreamer.ingestion.{{ hive_database }}.{{ hive_table
> }}.configFile` at all in deltastreamer.properties file. You can find more
> information here -
> https://hudi.apache.org/blog/2020/08/22/ingest-multiple-tables-using-hudi.
>
> Hope that helps!
>
> On Wed, Oct 6, 2021 at 4:36 PM Philip Ittmann  >
> wrote:
>
> > Good day,
> >
> > I am experiencing difficulty in getting a HoodieMultiTableDeltaStreamer
> > application to successfully run via spark-submit on an AWS EMR cluster
> with
> > the following versions:
> >
> > Hudi release: 0.7.0
> >
> > Release label:emr-6.3.0
> > Hadoop distribution:Amazon 3.2.1
> > Applications:Tez 0.9.2, Spark 3.1.1, Hive 3.1.2, Presto 0.245.1
> >
> > The error I am seeing follows below, but the gist of the problem seems to
> > be related to leaving out a forward slash between the bucket name and the
> > filename when initializing a Path object.
> >
> > After the error stack trace I include the spark-submit command as well as
> > the properties file I am using. Any help would be greatly appreciated!
> >
> > Exception in thread "main" java.lang.IllegalArgumentException:
> > java.net.URISyntaxException: Relative path in absolute URI: s3://{{
> > bucket_name }}{{ hive_database }}.{{ hive_table }}.properties
> > at org.apache.hadoop.fs.Path.initialize(Path.java:263)
> > at org.apache.hadoop.fs.Path.(Path.java:161)
> > at org.apache.hadoop.fs.Path.(Path.java:119)
> > at
> >
> >
> org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer.checkIfTableConfigFileExists(HoodieMultiTableDeltaStreamer.java:99)
> > at
> >
> >
> org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer.populateTableExecutionContextList(HoodieMultiTableDeltaStreamer.java:116)
> > at
> >
> >
> org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer.(HoodieMultiTableDeltaStreamer.java:80)
> > at
> >
> >
> org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer.main(HoodieMultiTableDeltaStreamer.java:203)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> > at
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:498)
> > at
> >
> >
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
> > at org.apache.spark.deploy.SparkSubmit.org
> > $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:959)
> > at
> > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> > at
> > org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> > at
> > org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> > at
> >
> >
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1038)
> > at
> > org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1047)
> > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> > Caused by: java.net.URISyntaxException: Relative path in absolute URI:
> > s3://{{ bucket_name }}{{ hive_database }}.{{ hive_table }}.properties
> > at java.net.URI.checkPath(URI.java:1823)
> > at java.net.URI.(URI.java:745)
> > at org.apache.hadoop.fs.Path.initialize(Path.java:260)
> > ... 18 more
> >
> > The spark-submit command I am running is:
> >
> > spark-submit \
> > --class
> > org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer \
> > --master yarn \
> > --deploy-mode client \
> > --num-executors 10 \
> > --executor-memory 3g \
> > --driver-memory 6g \

Re: Monthly or Bi-Monthly Dev meeting?

2021-10-12 Thread Gary Li
Hi Vinoth,

Summertime 8 AM PST was 11 PM in China so I guess it works for some forks,
but switching to wintertime it was 12 AM in China. It might be a bit late
IMO. Does 3 PM UTC(7 AM PST in winter, 8 AM in summer) work?

Best,
Gary

On Tue, Oct 5, 2021 at 9:20 PM Pratyaksh Sharma 
wrote:

> Works for me in India :)
>
> On Tue, Oct 5, 2021 at 9:41 AM Vinoth Chandar  wrote:
>
> > Looks like there is enough interest here.
> >
> > Moving onto timing. Does 8AM PST, on the second thursday of every
> > month work for everyone?
> > This is the time I find, works best for most time zones.
> >
> > On Thu, Sep 23, 2021 at 1:15 PM Y Ethan Guo 
> > wrote:
> >
> > > +1 on monthly community sync.
> > >
> > > On Thu, Sep 23, 2021 at 12:32 PM Udit Mehrotra 
> > wrote:
> > >
> > > > +1 for the monthly meeting. It would be great to start syncing up
> > > > again. Thanks Vinoth for bringing it up !
> > > >
> > > > On Thu, Sep 23, 2021 at 12:14 PM Sivabalan 
> wrote:
> > > > >
> > > > > +1 on monthly meet up.
> > > > >
> > > > > On Thu, Sep 23, 2021 at 11:01 AM vino yang 
> > > > wrote:
> > > > >
> > > > > > +1 for monthly
> > > > > >
> > > > > > Best,
> > > > > > Vino
> > > > > >
> > > > > > Pratyaksh Sharma  于2021年9月23日周四 下午9:36写道:
> > > > > >
> > > > > > > Monthly should be good. Been a long time since we connected in
> > > these
> > > > > > > meetings. :)
> > > > > > >
> > > > > > > On Thu, Sep 23, 2021 at 7:02 PM Vinoth Chandar <
> > > > > > > mail.vinoth.chan...@gmail.com> wrote:
> > > > > > >
> > > > > > > > 1 hour monthly is what I was proposing to be specific.
> > > > > > > >
> > > > > > > > On Thu, Sep 23, 2021 at 6:30 AM Gary Li 
> > > wrote:
> > > > > > > >
> > > > > > > > > +1 for monthly.
> > > > > > > > >
> > > > > > > > > On Thu, Sep 23, 2021 at 8:28 PM Vinoth Chandar <
> > > > vin...@apache.org>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi all,
> > > > > > > > > >
> > > > > > > > > > Once upon a time, we used to have a weekly community
> sync.
> > > > > > Wondering
> > > > > > > if
> > > > > > > > > > there is interest in having a monthly or bi-monthly dev
> > > > meeting?
> > > > > > > > > >
> > > > > > > > > > Agenda could be
> > > > > > > > > > - Update/Summary of all dev work tracks
> > > > > > > > > > - Show and tell, where people can present their ongoing
> > work
> > > > > > > > > > - Open floor discussions, bring up new issues.
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > > Vinoth
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Regards,
> > > > > -Sivabalan
> > > >
> > >
> >
>