Hi Allen, Thanks for the valuable feedback! Can we retarget this to master.. and the 0.5.3 RM can backport it on top of 0.5.2..
Sharing what I know on these points. - Once you install the checkstyle plugin, you can setup IntelliJ to use checkstyle file as the code style and that has been working fairly well for me atleast. - Hmmm. master does not have commons .. did you use checkstyle from master or 0.5.1? We disallowed some libraries like apache commons/guava since they cause jar/class mismatches a lot when integrating into all these query engines :) - Agree.. `public abstract class KeyGenerator implements Serializable should have taken care of it, I would think so. You are referring to the KeyGenerator impl, right - Docs are definitely being worked on. Pratyaksh has the JIRA assigned for now I think. IMO we can add this to `writing data` page. On Wed, May 6, 2020 at 12:20 PM Allen Underwood <[email protected]> wrote: > Hello Vinoth, > > As promised, here's a PR into 0.5.2 - I think it might be worth bringing > that into master / 0.5.3 as well. But I figured I'd at least get this PR > out there for someone to review. > https://github.com/apache/incubator-hudi/pull/1597 > > For what it's worth, there were definitely some point points I encountered: > * checkstyle.xml - not supported well at all in IntelliJ - downloaded a > plugin and it didn't help - had to compile, find errors, rinse / repeat > * There are libraries included in the pom.xml that for some reason are not > allowed per the checkstyle.xml - doesn't make sense (org.apache.commons.*) > - why have them in the project if you can't use them? > * Would have thought everything was good to go after running unit tests, > but when deploying to a real cluster, found that the entire class had to be > serializable - would have been nice to know that before-hand as that would > have saved several cycles. - probably worth documenting somewhere? > * Don't really know where I should document these changes as I only found > out how to do these things via Vinoth's original reply to my email - would > be nice if there was some sort of "extending Hudi" documentation somewhere > > Hope this becomes useful for someone else. FYI - this is working > perfectly for my use-case. Unit tests show several different approaches > but I wouldn't mind throwing some documentation together to help folks out. > > Let me know if you need anything else to help move this along - surely I > can't be the only one that needed it! :-) > > Allen > > On Tue, May 5, 2020 at 11:22 AM Vinoth Chandar <[email protected]> wrote: > >> Great! >> >> On Mon, May 4, 2020 at 5:43 PM Allen Underwood >> <[email protected]> wrote: >> >> > Hi Vinoth, >> > >> > Yes I was going to set some things up in the morning. I’ll let you know >> > how it turns out and if it’s worth a PR I’ll get one together. >> > >> > Thanks again for your help! >> > >> > Allen >> > >> > On Mon, May 4, 2020 at 8:40 PM Vinoth Chandar <[email protected]> >> wrote: >> > >> >> Thanks both! >> >> >> >> @allen heard this many times :) hear you. You could write a small class >> >> yourself with your custom logic and throw it in there? >> >> >> >> If you think there is a way to fix the key generator in Hudi to be more >> >> resilient to these (e.g taking in a list of supported patterns vs just >> the >> >> one), let us know. >> >> >> >> On Mon, May 4, 2020 at 3:08 PM Allen Underwood >> >> <[email protected]> wrote: >> >> >> >> > Hi Vinoth - that was extremely helpful...I almost had it working, >> >> HOWEVER, >> >> > it appears I have dates that some have the ms on the end and others >> >> > don't....so if I pick adding a time format with them, then the ones >> >> without >> >> > the fail and vice versa....Good times. >> >> > >> >> > After I figure this out I'll see if I can put this information >> somewhere >> >> > easy to find. >> >> > >> >> > On Mon, May 4, 2020 at 12:23 PM Vinoth Chandar <[email protected]> >> >> wrote: >> >> > >> >> >> Hi Allen, >> >> >> >> >> >> You are able to configure the key generator for deltastreamer using >> >> this >> >> >> property (either via a file or --config ) >> >> >> hoodie.datasource.write.keygenerator.class >> >> >> >> >> >> You might be interested in this built-in generator. >> >> >> >> >> >> >> >> >> https://github.com/apache/incubator-hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/keygen/TimestampBasedKeyGenerator.java#L64 >> >> >> It takes let you configure a field as a recordKey, and if you can >> parse >> >> >> your timestamp using Java SimpleDateFormat, you can specify the >> >> datetime >> >> >> field and a pattern to parse it into.. >> >> >> >> >> >> Happy to make this work for you. >> >> >> >> >> >> community, any volunteers to faq/document this? :) >> >> >> >> >> >> >> >> >> On Mon, May 4, 2020 at 9:11 AM Allen Underwood >> >> >> <[email protected]> wrote: >> >> >> >> >> >> > I’ve tried to do my due diligence by googling / searching this >> slack >> >> and >> >> >> > I’ve come up empty.Is there a way through configuration / >> >> deltastreamer >> >> >> > to extract a custom partition key?Basically I have a datetime >> field >> >> in a >> >> >> > Kafka Source that has an ISO8601 datetime….is there a way to >> extract >> >> a >> >> >> > partition value out of that?I found this after some Googling, but >> >> this >> >> >> > seems like it’d only be useful if I wanted to write my own writer >> >> >> > application: >> >> >> > >> >> >> > >> >> >> >> >> >> https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/keygen/ComplexKeyGenerator.javaAny >> >> >> > way to do what I need through configuration of the spark job / >> hudi >> >> >> > configuration? >> >> >> > >> >> >> hudi-spark/src/main/java/org/apache/hudi/keygen/ComplexKeyGenerator.java >> >> >> > < >> >> >> >> >> >> https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/keygen/ComplexKeyGenerator.java >> >> >> > >> >> >> > >> >> >> > -- >> >> >> > *Allen Underwood* >> >> >> > >> >> >> >> >> > >> >> > >> >> > -- >> >> > *Allen Underwood* >> >> > Principal Software Engineer >> >> > Broadcom | Symantec Enterprise Division >> >> > *Mobile*: 404.808.5926 >> >> > >> >> >> > -- >> > *Allen Underwood* >> > Principal Software Engineer >> > Broadcom | Symantec Enterprise Division >> > *Mobile*: 404.808.5926 >> > >> > > > -- > *Allen Underwood* > Principal Software Engineer > Broadcom | Symantec Enterprise Division > *Mobile*: 404.808.5926 >
