Hello Vinoth,

As promised, here's a PR into 0.5.2 - I think it might be worth bringing
that into master / 0.5.3 as well.  But I figured I'd at least get this PR
out there for someone to review.
https://github.com/apache/incubator-hudi/pull/1597

For what it's worth, there were definitely some point points I encountered:
* checkstyle.xml - not supported well at all in IntelliJ - downloaded a
plugin and it didn't help - had to compile, find errors, rinse / repeat
* There are libraries included in the pom.xml that for some reason are not
allowed per the checkstyle.xml - doesn't make sense (org.apache.commons.*)
- why have them in the project if you can't use them?
* Would have thought everything was good to go after running unit tests,
but when deploying to a real cluster, found that the entire class had to be
serializable - would have been nice to know that before-hand as that would
have saved several cycles. - probably worth documenting somewhere?
* Don't really know where I should document these changes as I only found
out how to do these things via Vinoth's original reply to my email - would
be nice if there was some sort of "extending Hudi" documentation somewhere

Hope this becomes useful for someone else.  FYI - this is working perfectly
for my use-case.  Unit tests show several different approaches but I
wouldn't mind throwing some documentation together to help folks out.

Let me know if you need anything else to help move this along - surely I
can't be the only one that needed it!  :-)

Allen

On Tue, May 5, 2020 at 11:22 AM Vinoth Chandar <[email protected]> wrote:

> Great!
>
> On Mon, May 4, 2020 at 5:43 PM Allen Underwood
> <[email protected]> wrote:
>
> > Hi  Vinoth,
> >
> > Yes I was going to set some things up in the morning. I’ll let you know
> > how it turns out and if it’s worth a PR I’ll get one together.
> >
> > Thanks again for your help!
> >
> > Allen
> >
> > On Mon, May 4, 2020 at 8:40 PM Vinoth Chandar <[email protected]> wrote:
> >
> >> Thanks both!
> >>
> >> @allen heard this many times :) hear you. You could write a small class
> >> yourself with your custom logic and throw it in there?
> >>
> >> If you think there is a way to fix the key generator in Hudi to be more
> >> resilient to these (e.g taking in a list of supported patterns vs just
> the
> >> one), let us know.
> >>
> >> On Mon, May 4, 2020 at 3:08 PM Allen Underwood
> >> <[email protected]> wrote:
> >>
> >> > Hi Vinoth - that was extremely helpful...I almost had it working,
> >> HOWEVER,
> >> > it appears I have dates that some have the ms on the end and others
> >> > don't....so if I pick adding a time format with them, then the ones
> >> without
> >> > the fail and vice versa....Good times.
> >> >
> >> > After I figure this out I'll see if I can put this information
> somewhere
> >> > easy to find.
> >> >
> >> > On Mon, May 4, 2020 at 12:23 PM Vinoth Chandar <[email protected]>
> >> wrote:
> >> >
> >> >> Hi Allen,
> >> >>
> >> >> You are able to configure the key generator for deltastreamer using
> >> this
> >> >> property (either via a file or --config )
> >> >> hoodie.datasource.write.keygenerator.class
> >> >>
> >> >> You might be interested in this built-in generator.
> >> >>
> >> >>
> >>
> https://github.com/apache/incubator-hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/keygen/TimestampBasedKeyGenerator.java#L64
> >> >> It takes let you configure a field as a recordKey, and if you can
> parse
> >> >> your timestamp using Java SimpleDateFormat, you can specify the
> >> datetime
> >> >> field and a pattern to parse it into..
> >> >>
> >> >> Happy to make this work for you.
> >> >>
> >> >> community, any volunteers to faq/document this? :)
> >> >>
> >> >>
> >> >> On Mon, May 4, 2020 at 9:11 AM Allen Underwood
> >> >> <[email protected]> wrote:
> >> >>
> >> >> > I’ve tried to do my due diligence by googling / searching this
> slack
> >> and
> >> >> > I’ve come up empty.Is there a way through configuration /
> >> deltastreamer
> >> >> > to extract a custom partition key?Basically I have a datetime field
> >> in a
> >> >> > Kafka Source that has an ISO8601 datetime….is there a way to
> extract
> >> a
> >> >> > partition value out of that?I found this after some Googling, but
> >> this
> >> >> > seems like it’d only be useful if I wanted to write my own writer
> >> >> > application:
> >> >> >
> >> >> >
> >> >>
> >>
> https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/keygen/ComplexKeyGenerator.javaAny
> >> >> > way to do what I need through configuration of the spark job / hudi
> >> >> > configuration?
> >> >> >
> >> hudi-spark/src/main/java/org/apache/hudi/keygen/ComplexKeyGenerator.java
> >> >> > <
> >> >>
> >>
> https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/keygen/ComplexKeyGenerator.java
> >> >> >
> >> >> >
> >> >> > --
> >> >> > *Allen Underwood*
> >> >> >
> >> >>
> >> >
> >> >
> >> > --
> >> > *Allen Underwood*
> >> > Principal Software Engineer
> >> > Broadcom | Symantec Enterprise Division
> >> > *Mobile*: 404.808.5926
> >> >
> >>
> > --
> > *Allen Underwood*
> > Principal Software Engineer
> > Broadcom | Symantec Enterprise Division
> > *Mobile*: 404.808.5926
> >
>


-- 
*Allen Underwood*
Principal Software Engineer
Broadcom | Symantec Enterprise Division
*Mobile*: 404.808.5926

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to