Re:Re: [DISSCUSS] Troubleshooting flow

2020-04-03 Thread lamber-ken


Thanks you all, 


Agree with Sudha, it's ok to answer simple questions and move debugging type of 
questions to GH issues.
So, let's try to guide users who asking debugging questions to use GH issues if 
possible.


Thanks,
Lamber-Ken





At 2020-04-03 07:19:26, "Bhavani Sudha"  wrote:
>Also one thing I wanted to note. I feel it should be okay to answer simple
>`what does this mean` type of questions in slack and move debugging type of
>questions to GH issues. What do you all think?
>
>Thanks,
>Sudha
>
>On Thu, Apr 2, 2020 at 11:45 AM Bhavani Sudha 
>wrote:
>
>> Agree on using GH issues to post code snippets or debugging issues.
>>
>> Regarding mirroring slack to commits, the last time I checked there was no
>> options that was readily available ( there were one or two paid products).
>> It looked like we can possibly develop our own IFTT/ web hook on slack. Not
>> sure how much of work that is.
>>
>>
>> Thanks,
>> Sudha
>>
>>
>> On Thu, Apr 2, 2020 at 8:40 AM Vinoth Chandar  wrote:
>>
>>> Hello all,
>>>
>>> Actually that's how we have been using GH issues.. Both slack/ml are
>>> inconvenient for sharing code and having long threaded conversations.
>>> (same
>>> issues raised here).
>>>
>>> That said, we could definitely formalize this and look to move slack
>>> threads into GH issue for triaging (then follow up with JIRA, if real bug)
>>> before they get too long.
>>>
>>> >>slack has some answerbot to auto reply and promote users to create GH
>>> issues.
>>> Worth looking into.. There was also a conversation around mirroring
>>> #general into commits or something for indexing/searching.. ?
>>>
>>>
>>> On Thu, Apr 2, 2020 at 1:36 AM vino yang  wrote:
>>>
>>> > Hi Lamber-Ken,
>>> >
>>> > Thanks for rasing this problem.
>>> >
>>> > >> 3. threads cann't be indexed by search engines
>>> >
>>> > Yes, I always thought that it would be better to have a "users" ML, but
>>> it
>>> > is not clear whether only the Top-Level Project can have this ML.
>>> >
>>> > Best,
>>> > Vino
>>> >
>>> >
>>> > Shiyan Xu  于2020年4月1日周三 上午4:54写道:
>>> >
>>> > > Good idea to use GH issues as triage.
>>> > >
>>> > > Not sure if slack has some answerbot to auto reply and promote users
>>> to
>>> > > create GH issues. If it can be configured that way, that'd be great
>>> for
>>> > > this purpose :)
>>> > >
>>> > > On Tue, 31 Mar 2020, 10:03 lamberken,  wrote:
>>> > >
>>> > > > Hi team,
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > > Many users use slack ask for support when they met bugs / problems
>>> > > > currently.
>>> > > >
>>> > > > but there are some disadvantages we need to consider:
>>> > > >
>>> > > > 1. code snippet display is not friendly.
>>> > > >
>>> > > > 2. we may miss some questions when questions come up at the same
>>> time.
>>> > > >
>>> > > > 3. threads cann't be indexed by search engines
>>> > > >
>>> > > > ...
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > > So, I suggest we should guide users to use GitHub issues as much as
>>> we
>>> > > can.
>>> > > >
>>> > > > step1: guide users use GitHub issues to report their questions
>>> > > >
>>> > > > step2: developers can pick up some issues which they are interested
>>> in.
>>> > > >
>>> > > > step3: raise a related JIRA if needed
>>> > > >
>>> > > > step4: add some useful notes to troubleshooting guide
>>> > > >
>>> > > >
>>> > > >
>>> > > > Any thoughts are welcome, thanks : )
>>> > > >
>>> > > >
>>> > > > Best,
>>> > > > Lamber-Ken
>>> > >
>>> >
>>>
>>


Re:Re: New Committer: lamber-ken

2020-04-09 Thread lamber-ken


Dear team,


Thank you all, let us together strive to work together! 


Best,
Lamber-Ken





At 2020-04-09 03:49:12, "Shiyan Xu"  wrote:
>Congrats Lamber-ken! Well deserved!
>
>On Wed, Apr 8, 2020 at 4:52 AM Sivabalan  wrote:
>
>> Congrats Lamber! Well deserved.
>>
>> On Wed, Apr 8, 2020 at 5:21 AM Pratyaksh Sharma 
>> wrote:
>>
>> > Congratulations lamberken!
>> >
>> > On Wed, Apr 8, 2020 at 11:10 AM Jiayi Liao 
>> > wrote:
>> >
>> > > Congratulations!
>> > >
>> > > Best,
>> > > Jiayi Liao
>> > >
>> > > On Wed, Apr 8, 2020 at 12:15 PM tison  wrote:
>> > >
>> > > > Congrats lamber!
>> > > >
>> > > > Best,
>> > > > tison.
>> > > >
>> > > >
>> > > > vino yang  于2020年4月8日周三 上午11:45写道:
>> > > >
>> > > > > Congrats lamber! Well deserved!
>> > > > >
>> > > > > Best,
>> > > > > Vino
>> > > > >
>> > > > > leesf  于2020年4月8日周三 上午9:30写道:
>> > > > >
>> > > > > > Congrats lamber-ken, well deserved!
>> > > > > >
>> > > > > > Balaji Varadarajan  于2020年4月8日周三
>> > > 上午6:45写道:
>> > > > > >
>> > > > > > >  Many Congratulations Lamber-Ken.  Well deserved !!
>> > > > > > > Balaji.V
>> > > > > > > On Tuesday, April 7, 2020, 02:23:51 PM PDT, Y Ethan Guo <
>> > > > > > > ethan.guoyi...@gmail.com> wrote:
>> > > > > > >
>> > > > > > >  Congrats!!!
>> > > > > > >
>> > > > > > > On Tue, Apr 7, 2020 at 2:22 PM Gary Li <
>> yanjia.gary...@gmail.com
>> > >
>> > > > > wrote:
>> > > > > > >
>> > > > > > > > Congrats lamber! Well deserved!
>> > > > > > > >
>> > > > > > > > On Tue, Apr 7, 2020 at 2:18 PM Vinoth Chandar <
>> > vin...@apache.org
>> > > >
>> > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > Hello Apache Hudi Community,
>> > > > > > > > >
>> > > > > > > > > The Podling Project Management Committee (PPMC) for Apache
>> > > > > > > > > Hudi (Incubating) has invited lamber-ken (Xie Lei) to
>> become
>> > a
>> > > > > > > committer
>> > > > > > > > > and we are pleased to announce that he has accepted.
>> > > > > > > > >
>> > > > > > > > > lamber-ken has had a large impact by in hudi, with some
>> > > sustained
>> > > > > > > efforts
>> > > > > > > > > in the past several months. He has rebuilt our site ground
>> > up,
>> > > > > > > automated
>> > > > > > > > > doc workflows, helped fixed a lot of bugs and also been
>> super
>> > > > > helpful
>> > > > > > > for
>> > > > > > > > > the community at large.
>> > > > > > > > >
>> > > > > > > > > Congratulations lamber-ken !! Please join me in recognizing
>> > his
>> > > > > > > efforts!
>> > > > > > > > >
>> > > > > > > > > On behalf of PPMC,
>> > > > > > > > > Vinoth
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>>
>> --
>> Regards,
>> -Sivabalan
>>


Re:Re: New PPMC Member : Bhavani Sudha

2020-04-09 Thread lamber-ken



Congrats Sudha! Well deserved!


Best,
Lamber-Ken

在 2020-04-09 13:23:38,"Bhavani Sudha"  写道:
>Thank you all :) I am excited to be part of Hudi PPMC.
>
>-Sudha
>
>On Wed, Apr 8, 2020 at 12:48 PM Shiyan Xu 
>wrote:
>
>> Congrats Sudha! Well deserved!
>>
>> On Tue, Apr 7, 2020 at 8:46 PM vino yang  wrote:
>>
>> > Congrats sudha, well deserved!
>> >
>> > Best,
>> > Vino
>> >
>> > leesf  于2020年4月8日周三 上午9:31写道:
>> >
>> > > Congrats sudha, well deserved!
>> > >
>> > > Balaji Varadarajan  于2020年4月8日周三 上午6:55写道:
>> > >
>> > > >  Congratulations Sudha :) Well deserved.  Welcome to PPMC.
>> > > > Balaji.V
>> > > >
>> > > > On Tuesday, April 7, 2020, 03:04:37 PM PDT, Gary Li <
>> > > > yanjia.gary...@gmail.com> wrote:
>> > > >
>> > > >  Congrats Sudha! Appreciated all the work you have done!
>> > > >
>> > > > On Tue, Apr 7, 2020 at 2:57 PM Y Ethan Guo > >
>> > > > wrote:
>> > > >
>> > > > > Congrats!!!
>> > > > >
>> > > > > On Tue, Apr 7, 2020 at 2:55 PM Vinoth Chandar 
>> > > wrote:
>> > > > >
>> > > > > > Hello all,
>> > > > > >
>> > > > > > I am very excited to share that we have new PPMC member - Sudha.
>> > She
>> > > > has
>> > > > > > been a great champion for the project for almost couple years
>> now,
>> > > > > driving
>> > > > > > a lot of presto/query engine facing changes and most of all being
>> > the
>> > > > > face
>> > > > > > of our community to new users on Slack, over the past few months.
>> > > > > >
>> > > > > > Please join me in congratulating her!
>> > > > > >
>> > > > > > On behalf of Hudi PPMC,
>> > > > > > Vinoth
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>


Re:Checking out the asf svn repo

2020-04-16 Thread lamber-ken


Hi Vinoth,


You can get help from the following documentation[1]


[1] https://infra.apache.org/version-control.html




Best,
Lamber-Ken











At 2020-04-17 08:24:01, "Vinoth Chandar"  wrote:
>Hello all,
>
>Can anyone here (potentially from prior experience with other apache
>projects) point me, to how I can checkout the apache svn repo here?
>https://svn.apache.org/viewvc/incubator/public/trunk/content/projects/hudi.xml?view=log
>
>
>Would like to make some edits to our status file.. Specifically, I am
>trying to understand how I authenticate to commit the changes?  (I have not
>used svn in years. So apologize if I am asking some basic qs)
>
>Thanks
>Vinoth


Re:[DISCUSS] Bug bash?

2020-04-22 Thread lamber-ken



Wow, challenging job, +1


Best,
Lamber-Ken

At 2020-04-23 04:51:01, "Vinoth Chandar"  wrote:
>Just floating a very random idea here. :)
>
>Would there be interest in doing a bug bash for a week, where we
>aggressively close out some pesky bugs that have been lingering around.. If
>enough committers and contributors are around, we can move the needle. We
>could time this a week before cutting RC for next release.
>
>Thanks
>Vinoth


Re:[VOTE] Apache Hudi graduation to top level project

2020-05-06 Thread lamber-ken
+1

















At 2020-05-07 04:55:48, "Vinoth Chandar"  wrote:
>Hello all,
>
>Per our discussion on the dev mailing list (
>https://lists.apache.org/thread.html/rc98303d9f09665af90ab517ea0baeb7c374e9a5478d8424311e285cd%40%3Cdev.hudi.apache.org%3E
>)
>
>I would like to call a VOTE for Apache Hudi graduating as a top level
>project.
>
>If this vote passes, the next step would be to submit the resolution below
>to the Incubator PMC, who would vote on sending it on to the Apache Board.
>
>Vote:
>[ ] +1 - Recommend graduation of Apache Hudi as a TLP
>[ ] -1 - Do not recommend graduation of Apache Hudi because...
>
>The VOTE is open for a minimum of 72 hours.
>
>Establish the Apache Hudi Project
>
>WHEREAS, the Board of Directors deems it to be in the best interests of the
>Foundation and consistent with the Foundation's purpose to establish a
>Project Management Committee charged with the creation and maintenance of
>open-source software, for distribution at no charge to the public, related
>to providing atomic upserts and incremental data streams on Big Data.
>
>NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee (PMC),
>to be known as the "Apache Hudi Project", be and hereby is established
>pursuant to Bylaws of the Foundation; and be it further
>
>RESOLVED, that the Apache Hudi Project be and hereby is responsible for the
>creation and maintenance of software related to providing atomic upserts
>and incremental data streams on Big Data; and be it further
>
>RESOLVED, that the office of "Vice President, Apache Hudi" be and hereby is
>created, the person holding such office to serve at the direction of the
>Board of Directors as the chair of the Apache Hudi Project, and to have
>primary responsibility for management of the projects within the scope of
>responsibility of the Apache Hudi Project; and be it further
>
>RESOLVED, that the persons listed immediately below be and hereby are appointed
>to serve as the initial members of the Apache Hudi Project:
>
> * Anbu Cheeralan   
>
> * Balaji Varadarajan
>
> * Bhavani Sudha Saktheeswaran   
>
> * Luciano Resende 
>
> * Nishith Agarwal
>
> * Prasanna Rajaperumal
>
> * Shaofeng Li   
>
> * Steve Blackmon  
>
> * Suneel Marthi  
>
> * Thomas Weise
>
> * Vino Yang   
>
> * Vinoth Chandar
>
>NOW, THEREFORE, BE IT FURTHER RESOLVED, that Vinoth Chandar be appointed to
>the office of Vice President, Apache Hudi, to serve in accordance with and
>subject to the direction of the Board of Directors and the Bylaws of the
>Foundation until death, resignation, retirement, removal of
>disqualification, or until a successor is appointed; and
>
>be it further
>
>RESOLVED, that the Apache Hudi Project be and hereby is tasked with the
>migration and rationalization of the Apache Incubator Hudi podling; and
>
>be it further
>
>RESOLVED, that all responsibilities pertaining to the Apache Incubator Hudi
>podling encumbered upon the Apache Incubator PMC are hereafter discharged.


Re:[RESULT] [VOTE] Apache Hudi graduation to top level project

2020-05-09 Thread lamber-ken
Congratulations !


Best,
Lamber-Ken

















At 2020-05-10 13:57:10, "Vinoth Chandar"  wrote:
>Hello all,
>
>The vote proposing Apache Hudi graduating as a top level project has passed
>with the following +1. votes;
>
>*PPMC +1 votes (11)*
>
> * Bhavani Sudha Saktheeswaran
> * Suneel Marthi  (Mentor)
>* Anbu Cheeralan
>* Vino Yang
>* Nishith Agarwal
>* Prasanna Rajaperumal
>* Luciano Resende (Mentor)
>* Shaofeng Li
>* Balaji Varadarajan
>* Thomas Weise (Mentor)
>* Vinoth Chandar
>
>
>*Non PPMC +1 Votes (10)*
>
> *** Gary Li
> * Sivabalan Narayanan
> * Shiyan (Raymond) Xu
> * wxhjsxz
> * Lamber-ken
> * cooper
> * Y Ethan Guo
> * Tison
> * Pratyaksh Sharma
> * Shaofeng Shi
>
>
>21 +1 votes and zero -1 votes.
>
>
>Vote thread:
>https://lists.apache.org/thread.html/rc19cedf6bb423cfc24efea27d8952a2224370a7679bab74d05de19b2%40%3Cdev.hudi.apache.org%3E
>
>I will follow up with the next steps, which is a DISCUSS thread on IPMC
>general mailing list.
>
>Thanks all,
>Vinoth


Re:Re: Apache Hudi Graduation vote on general@incubator

2020-05-18 Thread lamber-ken



Gread job! and good luck for apache hudi project.




Best,
Lamber-Ken

At 2020-05-19 13:35:11, "Vinoth Chandar"  wrote:
>Folks,
>
>the vote has passed!
>https://lists.apache.org/thread.html/r86278a1a69bbf340fa028aca784869297bd20ab50a71f4006669cdb5%40%3Cgeneral.incubator.apache.org%3E
>
>
>I will follow up with the next step [1], which is to submit the resolution
>to the board.
>
>[1]
>https://incubator.apache.org/guides/graduation.html#submission_of_the_resolution_to_the_board
>
>On Sun, May 17, 2020 at 7:14 PM 岳伟  wrote:
>
>> +1 Graduate Apache Hudi from the Incubator
>>
>>
>>
>>
>> Harvey Yue
>>
>>
>> On 05/16/2020 22:49,hamid pirahesh wrote:
>> [x ] +1 Graduate Apache Hudi from the Incubator.>
>>
>> On Fri, May 15, 2020 at 7:06 PM Vinoth Chandar  wrote:
>>
>> Hello all,
>>
>> Just started the VOTE on the IPMC general list [1]
>>
>> If you are an IPMC member, you do a *binding *vote
>> If you are not, you can still do a *non-binding* vote
>>
>> Please take a moment to vote.
>>
>> [1]
>>
>>
>> https://lists.apache.org/thread.html/r8039c8eece636df8c81a24c26965f5c1556a3c6404de02912d6455b4%40%3Cgeneral.incubator.apache.org%3E
>>
>> Thanks
>> Vinoth
>>
>>


Re: hudi dependency conflicts for test

2020-05-21 Thread Lamber-Ken
hello jiang,

Please try following demo, need spark(>=2.4.4)

--

export SPARK_HOME=/work/BigData/install/spark/spark-2.4.5-bin-hadoop2.7
${SPARK_HOME}/bin/spark-shell \
--packages 
org.apache.hudi:hudi-spark-bundle_2.11:0.5.2-incubating,org.apache.spark:spark-avro_2.11:2.4.4
 \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'

import org.apache.spark.sql.functions._

val tableName = "hudi_mor_table"
val basePath = "file:///tmp/hudi_cow_tablen"

val hudiOptions = Map[String,String](
  "hoodie.upsert.shuffle.parallelism" -> "10",
  "hoodie.datasource.write.recordkey.field" -> "key",
  "hoodie.datasource.write.partitionpath.field" -> "dt", 
  "hoodie.table.name" -> tableName,
  "hoodie.datasource.write.precombine.field" -> "timestamp"
)

val inputDF = spark.range(0, 5).
   withColumn("key", $"id").
   withColumn("data", lit("data")).
   withColumn("timestamp", current_timestamp()).
   withColumn("dt", date_format($"timestamp", "-MM-dd"))

inputDF.write.format("org.apache.hudi").
  options(hudiOptions).
  mode("Overwrite").
  save(basePath)

spark.read.format("org.apache.hudi").load(basePath + "/*/*").show();

--

Best,
Lamber-Ken


On 2020/05/21 18:59:02, Lian Jiang  wrote: 
> Thanks Shiyan and Vinoth. Unfortunately, adding
> org.apache.spark:spark-avro_2.11:2.4.4 throws another version related
> exception:
> 
> java.lang.NoSuchMethodError:
> org.apache.avro.Schema.createUnion([Lorg/apache/avro/Schema;)Lorg/apache/avro/Schema;
>   at 
> org.apache.spark.sql.avro.SchemaConverters$.toAvroType(SchemaConverters.scala:185)
>   at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$5.apply(SchemaConverters.scala:176)
>   at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$5.apply(SchemaConverters.scala:174)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:891)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at org.apache.spark.sql.types.StructType.foreach(StructType.scala:99)
>   at 
> org.apache.spark.sql.avro.SchemaConverters$.toAvroType(SchemaConverters.scala:174)
>   at 
> org.apache.hudi.AvroConversionUtils$.convertStructTypeToAvroSchema(AvroConversionUtils.scala:87)
>   at 
> org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:93)
>   at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
>   at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
>   at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
>   at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
>   at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
>   at 
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
>   at 
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
>   at org.apache.spark.sql.DataFrameWriter.save(Data

Re: Subscribing to commits@

2020-05-21 Thread Lamber Ken
thanks, very useful to me. 

+1 recommended it to anyone.

On 2020/02/28 01:00:36, Vinoth Chandar  wrote: 
> Folks,
> 
> Realized some folks may not have noticed this. But
> https://lists.apache.org/list.html?comm...@hudi.apache.org  has all the
> github/jira activity, in a single place..
> 
> If you are interested in helping others out on the community, please join
> that list for email notifications. That's how I keep track,
> 
> (If we could mirror #general this way from slack, it will be awesome)
> 


Re: Adding Usages to powered by page

2020-05-21 Thread Lamber Ken
Hello all,

In addition, it would be nice to add the official website to the comments,
if so, we can find the logo, and add it to powered_by page.

Thanks
Lamber-Ken

On 2020/05/21 15:51:58, Vinoth Chandar  wrote: 
> Hello all,
> 
> If you are using Apache Hudi and interested in featuring on the powered_by
> page, please let us know by leaving a comment here
> 
> https://github.com/apache/incubator-hudi/issues/661
> 
> As you know, we put in a lot of effort into the community.. Having a well
> maintained list, is a great way to attract more users and scale ourselves
> (more contributors, more faqs, better everything)..
> 
> So please take a moment..
> 
> Thanks
> Vinoth
> 


Re: hudi对应的spark-avro的问题

2020-05-26 Thread Lamber Ken
嗨,从 hudi-0.5.2 版本后,推荐使用 spark-2.4.4 版本,同时在spark-2.4.4中,spark-avro被单独出来,需要手动添加到 
jars 目录中。并不兼容spark-2.3.3版。

如果有其他问题,联系微信 19941866946,进国内微信群。

On 2020/05/26 07:23:02, "gaofeng5...@capinfo.com.cn" 
 wrote: 
> 
> 
> 
> spark版本2.3.2.3.1.0.0-78,提交代码为:
> def main(args: Array[String]): Unit = {
>   val spark = SparkSession.builder.appName("Demo")
> .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
> .master("local[3]")
> .getOrCreate()
>   //insert(spark)
>   update(spark)
>   query(spark)
>   //incremen¬talQueryPermalink(spark)
> 
>   spark.stop()
> }
> 
> /**
>  * 插入数据
>  *
>  * @param spark
>  */
> def insert(spark: SparkSession): Unit = {
>   val tableName = "hudi_archive_test"
>   val pathRoot = "/Users/tangxiuhong"
>   val basePath = pathRoot + "/deltalake/hudi/"
>   val inserts = List(
> """{"id" : 1,  "name": "iteblog", "age" : 101, "ts" : 1, "dt" : 
> "20191212"}""",
> """{"id" : 2, "name": "iteblog_hadoop", "age" : 102, "ts" : 1, "dt" : 
> "20191213"}""",
> """{"id" : 3, "name": "hudi", "age" : 103, "ts" : 2, "dt" : 
> "20191212"}""")
> 
>   //val inserts = List(
>   //  """{"id" : 4,  "name": "iteblog", "age" : 102, "ts" : 2, "dt" : 
> "20191212","addr" : "云南"}""",
>   //  """{"id" : 5, "name": "iteblog_hadoop", "age" : 103, "ts" : 2, "dt" 
> : "20191213","addr" : "浙江"}""",
>   //  """{"id" : 6,  "name": "hudi", "age" : 104, "ts" : 2, "dt" : 
> "20191212","addr" : "云南"}""")
>   val df = spark.read.json(spark.sparkContext.parallelize(inserts, 2))
> 
>   df.write.format("org.apache.hudi")
> // 设置主键列名
> .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "ts")
> // 设置数据更新时间的列名
> .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "id")
> // 设置多级分区必须设置为org.apache.hudi.keygen.ComplexKeyGenerator
> .option(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY, 
> "org.apache.hudi.keygen.ComplexKeyGenerator")
> // 设置多级分区列
> .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "dt,ts")
> // 设置索引类型目前有HBASE,INMEMORY,BLOOM,GLOBAL_BLOOM 四种索引 
> 为了保证分区变更后能找到必须设置全局GLOBAL_BLOOM
> .option(HoodieIndexConfig.BLOOM_INDEX_UPDATE_PARTITION_PATH, "true")
> // 设置索引类型目前有HBASE,INMEMORY,BLOOM,GLOBAL_BLOOM 四种索引
> .option(HoodieIndexConfig.INDEX_TYPE_PROP, 
> HoodieIndex.IndexType.GLOBAL_BLOOM.name())
> // 并行度参数设置
> .option("hoodie.insert.shuffle.parallelism", "2")
> .option("hoodie.upsert.shuffle.parallelism", "2")
> // 表名称设置
> .option(HoodieWriteConfig.TABLE_NAME, tableName)
> .mode(SaveMode.Append)
> .save(basePath)
> }
> 报以上错误是怎么回事呢?
> 
> 
> gaofeng5...@capinfo.com.cn
> 


Re: hudi对应的spark-avro的问题

2020-05-26 Thread Lamber Ken
另外一个细节点:用apache的邮箱,收不到图片。

谢谢

On 2020/05/26 07:23:02, "gaofeng5...@capinfo.com.cn" 
 wrote: 
> 
> 
> 
> spark版本2.3.2.3.1.0.0-78,提交代码为:
> def main(args: Array[String]): Unit = {
>   val spark = SparkSession.builder.appName("Demo")
> .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
> .master("local[3]")
> .getOrCreate()
>   //insert(spark)
>   update(spark)
>   query(spark)
>   //incremen¬talQueryPermalink(spark)
> 
>   spark.stop()
> }
> 
> /**
>  * 插入数据
>  *
>  * @param spark
>  */
> def insert(spark: SparkSession): Unit = {
>   val tableName = "hudi_archive_test"
>   val pathRoot = "/Users/tangxiuhong"
>   val basePath = pathRoot + "/deltalake/hudi/"
>   val inserts = List(
> """{"id" : 1,  "name": "iteblog", "age" : 101, "ts" : 1, "dt" : 
> "20191212"}""",
> """{"id" : 2, "name": "iteblog_hadoop", "age" : 102, "ts" : 1, "dt" : 
> "20191213"}""",
> """{"id" : 3, "name": "hudi", "age" : 103, "ts" : 2, "dt" : 
> "20191212"}""")
> 
>   //val inserts = List(
>   //  """{"id" : 4,  "name": "iteblog", "age" : 102, "ts" : 2, "dt" : 
> "20191212","addr" : "云南"}""",
>   //  """{"id" : 5, "name": "iteblog_hadoop", "age" : 103, "ts" : 2, "dt" 
> : "20191213","addr" : "浙江"}""",
>   //  """{"id" : 6,  "name": "hudi", "age" : 104, "ts" : 2, "dt" : 
> "20191212","addr" : "云南"}""")
>   val df = spark.read.json(spark.sparkContext.parallelize(inserts, 2))
> 
>   df.write.format("org.apache.hudi")
> // 设置主键列名
> .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "ts")
> // 设置数据更新时间的列名
> .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "id")
> // 设置多级分区必须设置为org.apache.hudi.keygen.ComplexKeyGenerator
> .option(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY, 
> "org.apache.hudi.keygen.ComplexKeyGenerator")
> // 设置多级分区列
> .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "dt,ts")
> // 设置索引类型目前有HBASE,INMEMORY,BLOOM,GLOBAL_BLOOM 四种索引 
> 为了保证分区变更后能找到必须设置全局GLOBAL_BLOOM
> .option(HoodieIndexConfig.BLOOM_INDEX_UPDATE_PARTITION_PATH, "true")
> // 设置索引类型目前有HBASE,INMEMORY,BLOOM,GLOBAL_BLOOM 四种索引
> .option(HoodieIndexConfig.INDEX_TYPE_PROP, 
> HoodieIndex.IndexType.GLOBAL_BLOOM.name())
> // 并行度参数设置
> .option("hoodie.insert.shuffle.parallelism", "2")
> .option("hoodie.upsert.shuffle.parallelism", "2")
> // 表名称设置
> .option(HoodieWriteConfig.TABLE_NAME, tableName)
> .mode(SaveMode.Append)
> .save(basePath)
> }
> 报以上错误是怎么回事呢?
> 
> 
> gaofeng5...@capinfo.com.cn
> 


[DISSCUSS] Trigger a Travis-CI rebuild without pushing a commit

2020-05-26 Thread Lamber Ken
Dear community,

Use case: A build fails due to an externality. The source is actually correct. 
It would build OK and pass if simply re-run. Is there some way to nudge 
Travis-CI to do another build, other than pushing a "dummy" commit?

The way I often used is `git commit --allow-empty -m 'trigger rebuild'`, push a 
dummy commit, the travis will rebuild. Also noticed some apache projects have 
supported this feature.

For example:
1. Carbondata use "retest this please"
https://github.com/apache/carbondata/pull/3387

2. Bookkeeper use "run pr validation"
https://github.com/apache/bookkeeper/pull/2158

But, I can't find a effective solution from Github and Travis's 
documentation[1], any thoughts or opinions?

Best,
Lamber-Ken

[1] https://docs.travis-ci.comhttps://support.github.com


Re: [DISSCUSS] Trigger a Travis-CI rebuild without pushing a commit

2020-05-27 Thread Lamber Ken
Thanks Sivabalan

For committers / pmcs, they can use these tools to trigger rebuilid directly,
But for contributors, they can open the url, but the retrigger button will 
hidden.

Best,
Lamber-Ken


On 2020/05/27 13:13:53, Sivabalan  wrote: 
> Not sure if this is a common practice. But can't we trigger via travis-ci
> directly? You can go here
> <https://travis-ci.org/github/apache/hudi/pull_requests> or here
> <https://travis-ci.org/github/apache/hudi/builds> and there you can find an
> option to restart the build (right most column in every row) again if need
> be. Wouldn't this suffice?
> 
> On Wed, May 27, 2020 at 5:50 AM vino yang  wrote:
> 
> > Hi Lamber-Ken,
> >
> > Thanks for opening this discussion.
> >
> > +1 to fix this issue.
> >
> > About the solution, can we consider to introduce a "CI Bot" just like
> > Apache Flink has done?[1]
> >
> > Just a thought.
> >
> > Best,
> > Vino
> >
> > [1]: https://github.com/flink-ci/ci-bot/
> >
> > Lamber Ken  于2020年5月27日周三 下午2:08写道:
> >
> > > Dear community,
> > >
> > > Use case: A build fails due to an externality. The source is actually
> > > correct. It would build OK and pass if simply re-run. Is there some way
> > to
> > > nudge Travis-CI to do another build, other than pushing a "dummy" commit?
> > >
> > > The way I often used is `git commit --allow-empty -m 'trigger rebuild'`,
> > > push a dummy commit, the travis will rebuild. Also noticed some apache
> > > projects have supported this feature.
> > >
> > > For example:
> > > 1. Carbondata use "retest this please"
> > > https://github.com/apache/carbondata/pull/3387
> > >
> > > 2. Bookkeeper use "run pr validation"
> > > https://github.com/apache/bookkeeper/pull/2158
> > >
> > > But, I can't find a effective solution from Github and Travis's
> > > documentation[1], any thoughts or opinions?
> > >
> > > Best,
> > > Lamber-Ken
> > >
> > > [1] https://docs.travis-ci.comhttps://support.github.com
> > >
> >
> 
> 
> -- 
> Regards,
> -Sivabalan
> 


Re: [DISSCUSS] Trigger a Travis-CI rebuild without pushing a commit

2020-05-27 Thread Lamber Ken
Hi Prashant Wason,

"git push -f" command has no side effect. The purpose of this topic is to 
reduce manual operations. The ci bot will help us to retry failure tests.

For example, when use "retest thie please" command, the ci will retry 
automatically
1. Carbondata use "retest this please"
https://github.com/apache/carbondata/pull/3387

Thanks,
Lamber-Ken

On 2020/05/27 21:24:57, Prashant Wason  wrote: 
> I have used force push (git push -f) to re-trigger Travis build. I don't
> know if force push has any side effect but it does save an extra commit.
> 
> Thanks
> Prashant
> 
> 
> On Wed, May 27, 2020 at 11:11 AM Lamber Ken  wrote:
> 
> > Thanks Sivabalan
> >
> > For committers / pmcs, they can use these tools to trigger rebuilid
> > directly,
> > But for contributors, they can open the url, but the retrigger button will
> > hidden.
> >
> > Best,
> > Lamber-Ken
> >
> >
> > On 2020/05/27 13:13:53, Sivabalan  wrote:
> > > Not sure if this is a common practice. But can't we trigger via travis-ci
> > > directly? You can go here
> > > <
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__travis-2Dci.org_github_apache_hudi_pull-5Frequests&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=c89AU9T1AVhM4r2Xi3ctZA&m=xtI8Sm__cNsr2SNcNOd1TvHfk6eCk-zcl3mn1IagbGE&s=PTLkzOm-ocf96YucyzwNrhJ_yfQ3EB4zuNQSttiv6ow&e=
> > > or here
> > > <
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__travis-2Dci.org_github_apache_hudi_builds&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=c89AU9T1AVhM4r2Xi3ctZA&m=xtI8Sm__cNsr2SNcNOd1TvHfk6eCk-zcl3mn1IagbGE&s=wLjF5KLCkmXbjanjyE825M7qaSu2zf4qy2aUycf14ok&e=
> > > and there you can find an
> > > option to restart the build (right most column in every row) again if
> > need
> > > be. Wouldn't this suffice?
> > >
> > > On Wed, May 27, 2020 at 5:50 AM vino yang  wrote:
> > >
> > > > Hi Lamber-Ken,
> > > >
> > > > Thanks for opening this discussion.
> > > >
> > > > +1 to fix this issue.
> > > >
> > > > About the solution, can we consider to introduce a "CI Bot" just like
> > > > Apache Flink has done?[1]
> > > >
> > > > Just a thought.
> > > >
> > > > Best,
> > > > Vino
> > > >
> > > > [1]:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_flink-2Dci_ci-2Dbot_&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=c89AU9T1AVhM4r2Xi3ctZA&m=xtI8Sm__cNsr2SNcNOd1TvHfk6eCk-zcl3mn1IagbGE&s=Inll6__DlgwNzNDUKRbIhsW82R5SOdV45WMJQc4bS9o&e=
> > > >
> > > > Lamber Ken  于2020年5月27日周三 下午2:08写道:
> > > >
> > > > > Dear community,
> > > > >
> > > > > Use case: A build fails due to an externality. The source is actually
> > > > > correct. It would build OK and pass if simply re-run. Is there some
> > way
> > > > to
> > > > > nudge Travis-CI to do another build, other than pushing a "dummy"
> > commit?
> > > > >
> > > > > The way I often used is `git commit --allow-empty -m 'trigger
> > rebuild'`,
> > > > > push a dummy commit, the travis will rebuild. Also noticed some
> > apache
> > > > > projects have supported this feature.
> > > > >
> > > > > For example:
> > > > > 1. Carbondata use "retest this please"
> > > > >
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_carbondata_pull_3387&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=c89AU9T1AVhM4r2Xi3ctZA&m=xtI8Sm__cNsr2SNcNOd1TvHfk6eCk-zcl3mn1IagbGE&s=vFZU-5zKAkGzQ9IgZEDjmjis0VRW4k2FvIPtMh6OfJY&e=
> > > > >
> > > > > 2. Bookkeeper use "run pr validation"
> > > > >
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_bookkeeper_pull_2158&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=c89AU9T1AVhM4r2Xi3ctZA&m=xtI8Sm__cNsr2SNcNOd1TvHfk6eCk-zcl3mn1IagbGE&s=bn4KGvHB9vXo7XsNhM4rcS_j1zK0m838kS4n0Fba0Bk&e=
> > > > >
> > > > > But, I can't find a effective solution from Github and Travis's
> > > > > documentation[1], any thoughts or opinions?
> > > > >
> > > > > Best,
> > > > > Lamber-Ken
> > > > >
> > > > > [1]
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.travis-2Dci.com&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=c89AU9T1AVhM4r2Xi3ctZA&m=xtI8Sm__cNsr2SNcNOd1TvHfk6eCk-zcl3mn1IagbGE&s=sOMap9ncNGc2hArVtrms1e4f7kLrLA0r9sfbFaFws0w&e=
> >
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__support.github.com&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=c89AU9T1AVhM4r2Xi3ctZA&m=xtI8Sm__cNsr2SNcNOd1TvHfk6eCk-zcl3mn1IagbGE&s=zQCZcD0hb-5FqLkW5W_jX0BF1ET48sQa1vpEZq7LXmU&e=
> > > > >
> > > >
> > >
> > >
> > > --
> > > Regards,
> > > -Sivabalan
> > >
> >
> 


Re: delete the test module hudi-integ-test

2020-05-28 Thread Lamber Ken
based on hongdd,

local build, can use: 
mvn clean install -DskipTests -DskipITs -Dcheckstyle.skip=true -Drat.skip=true 

best

On 2020/05/28 10:57:17, cooper  wrote: 
> dear all:
> when i build the project,the following error occurred,could delete the
> unimport moudle?
> 
> [ERROR] Failed to execute goal
> org.codehaus.mojo:exec-maven-plugin:1.6.0:exec (Setup HUDI_WS) on project
> hudi-integ-test: Command execution failed.: Cannot run program "\bin\bash"
> (in directory "D:\code-repository\github\hudi\hudi-integ-test"):
> CreateProcess error=2, ϵͳ▒Ҳ▒▒▒ָ▒ļ▒▒▒ -> [Help 1]
> 


Re: [DISSCUSS] Trigger a Travis-CI rebuild without pushing a commit

2020-05-31 Thread Lamber Ken
Hi forks,

Learned from travis and github actions api docs these days, I used my project 
as a demo[1],
the demo pull request will always fail, please use "rerun tests" command, it 
will rerun tests automatically.

if you are interested, try it.

Best,
Lamber-Ken

[1] https://github.com/lamber-ken/hdocs/pull/36


On 2020/05/27 06:08:05, Lamber Ken  wrote: 
> Dear community,
> 
> Use case: A build fails due to an externality. The source is actually 
> correct. It would build OK and pass if simply re-run. Is there some way to 
> nudge Travis-CI to do another build, other than pushing a "dummy" commit?
> 
> The way I often used is `git commit --allow-empty -m 'trigger rebuild'`, push 
> a dummy commit, the travis will rebuild. Also noticed some apache projects 
> have supported this feature.
> 
> For example:
> 1. Carbondata use "retest this please"
> https://github.com/apache/carbondata/pull/3387
> 
> 2. Bookkeeper use "run pr validation"
> https://github.com/apache/bookkeeper/pull/2158
> 
> But, I can't find a effective solution from Github and Travis's 
> documentation[1], any thoughts or opinions?
> 
> Best,
> Lamber-Ken
> 
> [1] https://docs.travis-ci.comhttps://support.github.com
> 


Re: TLP Announcement

2020-06-04 Thread Lamber Ken
Great news, and thank you all and congratulations. 

On 2020/06/04 14:28:33, Vinoth Chandar  wrote: 
> Hello all,
> 
> The ASF press release announcing Apache Hudi as TLP is live! Thanks for all
> your contributions! We could not have been achieved that without such a
> great community effort!
> 
> Please help spread the word!
> 
> - GlobeNewswire
> http://www.globenewswire.com/news-release/2020/06/04/2043732/0/en/The-Apache-Software-Foundation-Announces-Apache-Hudi-as-a-Top-Level-Project.html
>  - ASF "Foundation" blog https://s.apache.org/odtwv
>  - @TheASF twitter feed
> https://twitter.com/TheASF/status/1268528110959497217
>  - The ASF on LinkedIn
> https://www.linkedin.com/company/the-apache-software-foundation
> 
> Thanks
> Vinoth
> 


Re: [DISSCUSS] Trigger a Travis-CI rebuild without pushing a commit

2020-06-05 Thread Lamber Ken
Hi Vinoth,

Based on the discussion above, I came up with an interesting idea that 
introduce a robot to build testing website automatically. If folks don't want 
to build staging site by themself, we can introduce a rebot which build testing 
website and push site automatically.

I have already raised a lira for this: 
https://github.com/apache/hudi/pull/1706
https://issues.apache.org/jira/browse/HUDI-998

Thanks,
Lamber-Ken

On 2020/06/01 15:14:32, Vinoth Chandar  wrote: 
> Great!  I left some comment on the PR. around licensing and maintenance
> overhead.
> 
> On Sun, May 31, 2020 at 11:51 PM Lamber Ken  wrote:
> 
> > Hi forks,
> >
> > Learned from travis and github actions api docs these days, I used my
> > project as a demo[1],
> > the demo pull request will always fail, please use "rerun tests" command,
> > it will rerun tests automatically.
> >
> > if you are interested, try it.
> >
> > Best,
> > Lamber-Ken
> >
> > [1] https://github.com/lamber-ken/hdocs/pull/36
> >
> >
> > On 2020/05/27 06:08:05, Lamber Ken  wrote:
> > > Dear community,
> > >
> > > Use case: A build fails due to an externality. The source is actually
> > correct. It would build OK and pass if simply re-run. Is there some way to
> > nudge Travis-CI to do another build, other than pushing a "dummy" commit?
> > >
> > > The way I often used is `git commit --allow-empty -m 'trigger rebuild'`,
> > push a dummy commit, the travis will rebuild. Also noticed some apache
> > projects have supported this feature.
> > >
> > > For example:
> > > 1. Carbondata use "retest this please"
> > > https://github.com/apache/carbondata/pull/3387
> > >
> > > 2. Bookkeeper use "run pr validation"
> > > https://github.com/apache/bookkeeper/pull/2158
> > >
> > > But, I can't find a effective solution from Github and Travis's
> > documentation[1], any thoughts or opinions?
> > >
> > > Best,
> > > Lamber-Ken
> > >
> > > [1] https://docs.travis-ci.comhttps://support.github.com
> > >
> >
> 


Re:hudi关于spark2.3版本不兼容的问题

2020-06-06 Thread lamber-ken



你好,


从1.5.2版本开始,仅支持spark-2.4.4,avro的版本是1.8.2,所以请升级到spark-2.4.4版本,再使用hudi。
另:方便加微信吗?微信号:xleesf,加入国内最大微信群




在 2020-06-06 18:15:20,"gaofeng5...@capinfo.com.cn"  
写道:

’我们大数据集群spark版本为2.3,然后执行hudi的代码报错:
Exception in thread "main" java.lang.NoSuchMethodError: 
org.apache.avro.Schema.createUnion([Lorg/apache/avro/Schema;)Lorg/apache/avro/Schema;




代码完整为:
def main(args: Array[String]): Unit = {
System.setProperty("HADOOP_USER_NAME", "spark")
val spark = SparkSession.builder.appName("Demo")
  .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
  .master("local[3]")
  .getOrCreate()
insert(spark)
//update(spark)
//query(spark)
//incremen¬talQueryPermalink(spark)

spark.stop()
  }




def insert(spark: SparkSession): Unit = {
val tableName = "hudi_archive_test"
val pathRoot = "/Users/tangxiuhong"
val basePath = pathRoot + "/deltalake/hudi/"
val inserts = List(
"""{"id" : 1,  "name": "iteblog", "age" : 101, "ts" : 1, "dt" : "20191212"}""",
"""{"id" : 2, "name": "iteblog_hadoop", "age" : 102, "ts" : 1, "dt" : 
"20191213"}""",
"""{"id" : 3, "name": "hudi", "age" : 103, "ts" : 2, "dt" : "20191212"}""")

//val inserts = List(
  //  """{"id" : 4,  "name": "iteblog", "age" : 102, "ts" : 2, "dt" : 
"20191212","addr" : "云南"}""",
  //  """{"id" : 5, "name": "iteblog_hadoop", "age" : 103, "ts" : 2, "dt" : 
"20191213","addr" : "浙江"}""",
  //  """{"id" : 6,  "name": "hudi", "age" : 104, "ts" : 2, "dt" : 
"20191212","addr" : "云南"}""")
val df = spark.read.json(spark.sparkContext.parallelize(inserts, 2))
  df.show(10)
  df.write.format("org.apache.hudi")
// 设置主键列名
.option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "ts")
// 设置数据更新时间的列名
.option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "id")
// 设置多级分区必须设置为org.apache.hudi.keygen.ComplexKeyGenerator
.option(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY, 
"org.apache.hudi.keygen.ComplexKeyGenerator")
// 设置多级分区列
.option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "dt,ts")
// 设置索引类型目前有HBASE,INMEMORY,BLOOM,GLOBAL_BLOOM 四种索引 
为了保证分区变更后能找到必须设置全局GLOBAL_BLOOM
.option(HoodieIndexConfig.BLOOM_INDEX_UPDATE_PARTITION_PATH, "true")
// 设置索引类型目前有HBASE,INMEMORY,BLOOM,GLOBAL_BLOOM 四种索引
.option(HoodieIndexConfig.INDEX_TYPE_PROP, 
HoodieIndex.IndexType.GLOBAL_BLOOM.name())
// 并行度参数设置
.option("hoodie.insert.shuffle.parallelism", "2")
.option("hoodie.upsert.shuffle.parallelism", "2")
// 表名称设置
.option(HoodieWriteConfig.TABLE_NAME, tableName)
.mode(SaveMode.Append)
.save(basePath)
}



报错截图:


我应该怎么处理这个问题呢?








gaofeng5...@capinfo.com.cn

Re:Re: [VOTE] Release 0.5.3, release candidate #2

2020-06-11 Thread lamber-ken
+1


At 2020-06-11 15:10:13, "Bhavani Sudha"  wrote:
>+1 (binding)
>
>Downloaded tar and verified compile [OK]
>
>Run integration test locally. [OK]
>
>Run a few tests in IDE. [OK]
>
>Run quickstart [OK]
>
>Verify NOTICE and LICENSE exists [OK]
>
>Check Checksum [OK]
>
>Check no Binary files in source release [OK]
>
>Rat Check Passed [OK]
>
>
>Thanks,
>
>Sudha
>
>On Wed, Jun 10, 2020 at 2:57 PM Sivabalan  wrote:
>
>> Hi everyone,
>>
>> Please review and vote on the release candidate #2 for the version 0.5.3,
>> as follows:
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific comments)
>>
>>  The complete staging area is available for your review, which includes:
>>
>> * JIRA release notes [1],
>> * the official Apache source release and binary convenience releases to be
>> deployed to dist.apache.org [2], which are signed with the key with
>> fingerprint 001B66FA2B2543C151872CCC29A4FD82F1508833 [3],
>> * all artifacts to be deployed to the Maven Central Repository [4],
>> * source code tag "release-0.5.3-rc2" [5],
>>
>> The vote will be open for at least 72 hours. It is adopted by majority
>> approval, with at least 3 PMC affirmative votes.
>>
>>
>> Thanks,
>> Release Manager
>>
>> [1]
>>
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12348256
>>
>> [2] https://dist.apache.org/repos/dist/dev/hudi/hudi-0.5.3-rc2/
>>
>> [3] https://dist.apache.org/repos/dist/release/hudi/KEYS
>>
>> [4] https://repository.apache.org/content/repositories/orgapachehudi-1023/
>>
>> [5] https://github.com/apache/hudi/tree/release-0.5.3-rc2
>>


Re:Congrats to our newest committers!

2021-01-27 Thread lamber-ken
Congratulations to Wang Xianghu and Li Wei, great job !


At 2021-01-27 19:16:24, "leesf"  wrote:
>Hi all,
>
>I am very happy to announce our newest committers.
>
>Wang Xianghu: Xianghu has done a great job in decoupling hudi with spark
>and implemented the first version of flink and contributed bug fixes, also
>he is very active in answering users questions in china wechat group.
>
>Li Wei: Liwei has also done a great job in driving major features like
>RFC-19 together with satish, also contributed many features and bug fixes
>in core modules.
>
>Please join me in congratulating them!
>
>Thanks,
>Leesf