[GitHub] [hudi] nsivabalan commented on issue #5298: [SUPPORT] File is deleted during inline compaction on MOR table causing subsequent FileNotFoundException on a reader

2022-04-15 Thread GitBox


nsivabalan commented on issue #5298:
URL: https://github.com/apache/hudi/issues/5298#issuecomment-1100580381

   @kasured : before I dive in, few pointers on the write configs used.
   1. I see you have enabled both inline and async compaction. Guess w/ 
streaming sink to hudi, only async compaction is possible and for MOR table, 
hudi automatically does async compaction. So, probably you can remove these 
configs. 
   ```
   "hoodie.compact.inline" = "true"
"hoodie.datasource.compaction.async.enable" = "true"
   ```
   
   2. and I also see you have enabled clustering. can we disable clustering and 
see if the issue is still reproducible. 
   
   with these changes, can you let us know if the problem still persists? 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] puchangchun commented on issue #4825: [SUPPORT] flink hudi some class not found

2022-04-15 Thread GitBox


puchangchun commented on issue #4825:
URL: https://github.com/apache/hudi/issues/4825#issuecomment-1100577825

   I'm running fine locally, but I reported this error in the Flink cluster 
environment, and I'm Jar already include on the HiveConf.class


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #5326: [SUPPORT] prometheus metrics labels

2022-04-15 Thread GitBox


nsivabalan commented on issue #5326:
URL: https://github.com/apache/hudi/issues/5326#issuecomment-1100577811

   @zxding : guess you are asking for adding arbitrary tags to each metrics 
right? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #5326: [SUPPORT] prometheus metrics labels

2022-04-15 Thread GitBox


nsivabalan commented on issue #5326:
URL: https://github.com/apache/hudi/issues/5326#issuecomment-1100577643

   @harsh1231 : Can you chime in here please. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-3892) Add HoodieReadClient with java

2022-04-15 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-3892:
--
Description: 
We might need a hoodie read client in java similar to the one we have for 
spark. 

 

 

[Apache Pulsar|https://github.com/apache/pulsar] is doing integration with 
Hudi, and take Hudi as tiered storage to offload topic cold data into Hudi. 
When consumers fetch cold data from topic, Pulsar broker will locate the target 
data is stored in Pulsar or not. If the target data stored in tiered storage 
(Hudi), Pulsar broker will fetch data from Hudi by Java API, and package them 
into Pulsar format and dispatch to consumer side.

However, we found current Hudi implementation doesn't support read Hudi table 
records by Java API, and we couldn't read the target data out from Hudi into 
Pulsar Broker, which will block the Pulsar & Hudi integration.
h3. What we need
 # We need Hudi to support reading records by Java API
 # We need Hudi to support read records out which keep the writer order, or 
support order by specific fields.

  was:
We might need a hoodie read client in java similar to the one we have for 
spark. 

 


> Add HoodieReadClient with java
> --
>
> Key: HUDI-3892
> URL: https://issues.apache.org/jira/browse/HUDI-3892
> Project: Apache Hudi
>  Issue Type: Task
>  Components: reader-core
>Reporter: sivabalan narayanan
>Priority: Critical
> Fix For: 0.12.0
>
>
> We might need a hoodie read client in java similar to the one we have for 
> spark. 
>  
>  
> [Apache Pulsar|https://github.com/apache/pulsar] is doing integration with 
> Hudi, and take Hudi as tiered storage to offload topic cold data into Hudi. 
> When consumers fetch cold data from topic, Pulsar broker will locate the 
> target data is stored in Pulsar or not. If the target data stored in tiered 
> storage (Hudi), Pulsar broker will fetch data from Hudi by Java API, and 
> package them into Pulsar format and dispatch to consumer side.
> However, we found current Hudi implementation doesn't support read Hudi table 
> records by Java API, and we couldn't read the target data out from Hudi into 
> Pulsar Broker, which will block the Pulsar & Hudi integration.
> h3. What we need
>  # We need Hudi to support reading records by Java API
>  # We need Hudi to support read records out which keep the writer order, or 
> support order by specific fields.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] nsivabalan commented on issue #5313: [SUPPORT] Do we have plan to support java reader for Hudi?

2022-04-15 Thread GitBox


nsivabalan commented on issue #5313:
URL: https://github.com/apache/hudi/issues/5313#issuecomment-1100577433

   @hangc0276 : We can definitely take this up. excited for hudi used as tiered 
storage :) 
   As @simonsssu showed interest to work on it, I will coordinate w/ him/her 
and get this going. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #5313: [SUPPORT] Do we have plan to support java reader for Hudi?

2022-04-15 Thread GitBox


nsivabalan commented on issue #5313:
URL: https://github.com/apache/hudi/issues/5313#issuecomment-1100577200

   cool. @simonsssu : I have created a tracking jira 
[here](https://issues.apache.org/jira/browse/HUDI-3892). Can you let me know 
your jira id. I can assign it to you. Also, this might be time sensitive, since 
its blocking pulsar integration. Just wanted to send out a gentle reminder. 
   Once you have the patch, do ping me. I can help review it. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-3892) Add HoodieReadClient with java

2022-04-15 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-3892:
--
Priority: Critical  (was: Major)

> Add HoodieReadClient with java
> --
>
> Key: HUDI-3892
> URL: https://issues.apache.org/jira/browse/HUDI-3892
> Project: Apache Hudi
>  Issue Type: Task
>  Components: reader-core
>Reporter: sivabalan narayanan
>Priority: Critical
> Fix For: 0.12.0
>
>
> We might need a hoodie read client in java similar to the one we have for 
> spark. 
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3892) Add HoodieReadClient with java

2022-04-15 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-3892:
--
Fix Version/s: 0.12.0

> Add HoodieReadClient with java
> --
>
> Key: HUDI-3892
> URL: https://issues.apache.org/jira/browse/HUDI-3892
> Project: Apache Hudi
>  Issue Type: Task
>  Components: reader-core
>Reporter: sivabalan narayanan
>Priority: Major
> Fix For: 0.12.0
>
>
> We might need a hoodie read client in java similar to the one we have for 
> spark. 
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-3892) Add HoodieReadClient with java

2022-04-15 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-3892:
-

 Summary: Add HoodieReadClient with java
 Key: HUDI-3892
 URL: https://issues.apache.org/jira/browse/HUDI-3892
 Project: Apache Hudi
  Issue Type: Task
  Components: reader-core
Reporter: sivabalan narayanan


We might need a hoodie read client in java similar to the one we have for 
spark. 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] nsivabalan commented on issue #5301: [SUPPORT]Support Show Data Files Command Based on Call Procedure Command for Spark SQL

2022-04-15 Thread GitBox


nsivabalan commented on issue #5301:
URL: https://github.com/apache/hudi/issues/5301#issuecomment-1100576398

   @XuQianJin-Stars : Can you file a tracking jira and follow up please. 
   and close out the github issue. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #5291: [SUPPORT] How to use hudi-defaults.conf with Glue

2022-04-15 Thread GitBox


nsivabalan commented on issue #5291:
URL: https://github.com/apache/hudi/issues/5291#issuecomment-1100576020

   @zhedoubushishi : can you chime in here please.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #5281: [SUPPORT] .hoodie/hoodie.properties file can be deleted due to retention settings of cloud providers

2022-04-15 Thread GitBox


nsivabalan commented on issue #5281:
URL: https://github.com/apache/hudi/issues/5281#issuecomment-1100574209

   Interesting. whats your lifecycle policy btw? any objects that was never 
updated in the last X days to be deleted? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #5262: [SUPPORT] Deltastreamer Error upserting bucketType UPDATE for partition :0

2022-04-15 Thread GitBox


nsivabalan commented on issue #5262:
URL: https://github.com/apache/hudi/issues/5262#issuecomment-1100572710

   @stym06 : likely schema has changed. Can you inspect let us know if thats 
the case. related jira https://issues.apache.org/jira/browse/HUDI-1711


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #5258: [SUPPORT] Write hudi data throws NoSuchMethodError with spark v2.4.4 and hudi v0.10.1

2022-04-15 Thread GitBox


nsivabalan commented on issue #5258:
URL: https://github.com/apache/hudi/issues/5258#issuecomment-1100571599

   can you try w/ scala 11 bundle and let us know if it succeeds. 
   hudi-spark-bundle_2.11-0.10.1.jar
   and for spark-avro, can you try setting it via `--packages 
org.apache.spark:spark-avro_2.11:2.4.4`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #5249: [SUPPORT] Deltastreamer job does not terminate on Kubernetes when hoodie.metrics.on=true

2022-04-15 Thread GitBox


nsivabalan commented on issue #5249:
URL: https://github.com/apache/hudi/issues/5249#issuecomment-1100570805

   @harsh1231 : Can you take a stab at this please. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #5248: [QUESION] Should filter prop "hoodie.datasource.write.operation" when use spark sql create table?

2022-04-15 Thread GitBox


nsivabalan commented on issue #5248:
URL: https://github.com/apache/hudi/issues/5248#issuecomment-1100570586

   @XuQianJin-Stars : Can you file a tracking jira and follow up on the issue. 
seems like we need to fix this. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #5242: [SUPPORT] Hudi embedded timeline server in 0.9 vs 0.10 with `hoodie.embed.timeline.server.port`

2022-04-15 Thread GitBox


nsivabalan commented on issue #5242:
URL: https://github.com/apache/hudi/issues/5242#issuecomment-1100570312

   @yihua : timeline server port related issue. Can you chime in here please.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #5233: [SUPPORT] _hoodie_is_deleted not working for spark Datasource.

2022-04-15 Thread GitBox


nsivabalan commented on issue #5233:
URL: https://github.com/apache/hudi/issues/5233#issuecomment-1100569034

   did you set default value for "_hoodie_is_deleted" to null or false? can you 
post the schema for the table. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan closed issue #5231: [SUPPORT] Inconsistent query result using GetLatestBaseFiles compared to Snapshot Query

2022-04-15 Thread GitBox


nsivabalan closed issue #5231: [SUPPORT] Inconsistent query result using 
GetLatestBaseFiles compared to Snapshot Query
URL: https://github.com/apache/hudi/issues/5231


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #5231: [SUPPORT] Inconsistent query result using GetLatestBaseFiles compared to Snapshot Query

2022-04-15 Thread GitBox


nsivabalan commented on issue #5231:
URL: https://github.com/apache/hudi/issues/5231#issuecomment-1100568785

   thanks @alexeykudinkin to find the root cause and fixing it 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #5211: [SUPPORT] Glob pattern to pick specific subfolders not working while reading in Spark

2022-04-15 Thread GitBox


nsivabalan commented on issue #5211:
URL: https://github.com/apache/hudi/issues/5211#issuecomment-1100568451

   So you want to read multiple hudi tables w/ one spark.read? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #5198: [SUPPORT] Querying data genereated by TimestampBasedKeyGenerator failed to parse timestamp in EPOCHMILLISECONDS column to date format

2022-04-15 Thread GitBox


nsivabalan commented on issue #5198:
URL: https://github.com/apache/hudi/issues/5198#issuecomment-1100568252

   @babumahesh-koo : do you have any updates on this end


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #5189: [SUPPORT] Multiple chaining of hudi tables via incremental source results in duplicate partition meta column

2022-04-15 Thread GitBox


nsivabalan commented on issue #5189:
URL: https://github.com/apache/hudi/issues/5189#issuecomment-1100568175

   @harsh1231 : in the mean time (until @bvaradar responds), can you 
investigate as to why we are encountering duplicate issue. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5337: [HUDI-3891] Fixing files partitioning sequence for `BaseFileOnlyRelation`

2022-04-15 Thread GitBox


hudi-bot commented on PR #5337:
URL: https://github.com/apache/hudi/pull/5337#issuecomment-1100525225

   
   ## CI report:
   
   * 3da31d0812e520a29079c628c7a134bc66f066f1 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8085)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5337: [HUDI-3891] Fixing files partitioning sequence for `BaseFileOnlyRelation`

2022-04-15 Thread GitBox


hudi-bot commented on PR #5337:
URL: https://github.com/apache/hudi/pull/5337#issuecomment-1100510773

   
   ## CI report:
   
   * 3da31d0812e520a29079c628c7a134bc66f066f1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8085)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on pull request #5057: [HUDI-3651] optimize the hoodie hive client and ddl executor code wit…

2022-04-15 Thread GitBox


danny0405 commented on PR #5057:
URL: https://github.com/apache/hudi/pull/5057#issuecomment-1100510685

   @JerryYue-M You may need to rebase the code with latest master, would take a 
look soon ~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on pull request #5087: [HUDI-3614] [DO_NOT_MERGE]Replace List with HoodieData in HoodieFlink/JavaTable and commit executors

2022-04-15 Thread GitBox


danny0405 commented on PR #5087:
URL: https://github.com/apache/hudi/pull/5087#issuecomment-1100510039

   > @danny0405 : can you follow up on the patch when you get a chance. guess 
author is waiting for review follow up from you. a gentle reminder.
   
   I don't see there is any gains for current stage of code, besides the 
duplicate code reduction, and with this patch, this is regression for 
performance for unnecessary copy of objects. So i'm not very sure we should 
work in this direction.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5337: [HUDI-3891] Fixing files partitioning sequence for `BaseFileOnlyRelation`

2022-04-15 Thread GitBox


hudi-bot commented on PR #5337:
URL: https://github.com/apache/hudi/pull/5337#issuecomment-1100509989

   
   ## CI report:
   
   * 3da31d0812e520a29079c628c7a134bc66f066f1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-3891) Investigate Hudi vs Raw Parquet table discrepancy

2022-04-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-3891:
-
Labels: pull-request-available  (was: )

> Investigate Hudi vs Raw Parquet table discrepancy
> -
>
> Key: HUDI-3891
> URL: https://issues.apache.org/jira/browse/HUDI-3891
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Alexey Kudinkin
>Assignee: Alexey Kudinkin
>Priority: Critical
>  Labels: pull-request-available
>
> While benchmarking querying raw Parquet tables against Hudi tables, i've run 
> the test against the same (Hudi) table:
>  # In one query path i'm reading it as just a raw Parquet table
>  # In another, i'm reading it as Hudi RO (read_optimized) table
> Surprisingly enough, those 2 diverge in the # of files being read:
>  
> _Raw Parquet_
> !https://t18029943.p.clickup-attachments.com/t18029943/f700a129-35bc-4aaa-948c-9495392653f2/Screen%20Shot%202022-04-15%20at%205.20.41%20PM.png!
>  
> _Hudi_
> !https://t18029943.p.clickup-attachments.com/t18029943/d063c689-a254-45cf-8ba5-07fc88b354b6/Screen%20Shot%202022-04-15%20at%205.21.33%20PM.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] alexeykudinkin opened a new pull request, #5337: [HUDI-3891] Fixing files partitioning sequence for `BaseFileOnlyRelation`

2022-04-15 Thread GitBox


alexeykudinkin opened a new pull request, #5337:
URL: https://github.com/apache/hudi/pull/5337

   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   Fixing files partitioning sequence for `BaseFileOnlyRelation` to make sure 
we efficiently bucket small files. This brings Hudi tables on par w/ raw 
Parquet tables.
   
   ## Brief change log
   
- Make sure we reverse sort the files before bucketing
   
   ## Verify this pull request
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xiarixiaoyao commented on pull request #5064: [HUDI-3654] Initialize hudi metastore module.

2022-04-15 Thread GitBox


xiarixiaoyao commented on PR #5064:
URL: https://github.com/apache/hudi/pull/5064#issuecomment-1100508616

   @minihippo  could you pls rebase the code and run azure again, thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-3891) Investigate Hudi vs Raw Parquet table discrepancy

2022-04-15 Thread Alexey Kudinkin (Jira)
Alexey Kudinkin created HUDI-3891:
-

 Summary: Investigate Hudi vs Raw Parquet table discrepancy
 Key: HUDI-3891
 URL: https://issues.apache.org/jira/browse/HUDI-3891
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Alexey Kudinkin
Assignee: Alexey Kudinkin


While benchmarking querying raw Parquet tables against Hudi tables, i've run 
the test against the same (Hudi) table:
 # In one query path i'm reading it as just a raw Parquet table
 # In another, i'm reading it as Hudi RO (read_optimized) table


Surprisingly enough, those 2 diverge in the # of files being read:

 
_Raw Parquet_
!https://t18029943.p.clickup-attachments.com/t18029943/f700a129-35bc-4aaa-948c-9495392653f2/Screen%20Shot%202022-04-15%20at%205.20.41%20PM.png!
 
_Hudi_
!https://t18029943.p.clickup-attachments.com/t18029943/d063c689-a254-45cf-8ba5-07fc88b354b6/Screen%20Shot%202022-04-15%20at%205.21.33%20PM.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot commented on pull request #5336: [DOCS] Add commit activity, twitter badgers, and Hudi logo in README

2022-04-15 Thread GitBox


hudi-bot commented on PR #5336:
URL: https://github.com/apache/hudi/pull/5336#issuecomment-1100475164

   
   ## CI report:
   
   * 2d1fc1b7ff81bff43152335b8135a31467c53674 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8084)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5336: [DOCS] Add commit activity, twitter badgers, and Hudi logo in README

2022-04-15 Thread GitBox


hudi-bot commented on PR #5336:
URL: https://github.com/apache/hudi/pull/5336#issuecomment-1100448918

   
   ## CI report:
   
   * 2d1fc1b7ff81bff43152335b8135a31467c53674 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8084)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5336: [DOCS] Add commit activity, twitter badgers, and Hudi logo in README

2022-04-15 Thread GitBox


hudi-bot commented on PR #5336:
URL: https://github.com/apache/hudi/pull/5336#issuecomment-1100447825

   
   ## CI report:
   
   * 2d1fc1b7ff81bff43152335b8135a31467c53674 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua opened a new pull request, #5336: [DOCS] Add commit activity, twitter badgers, and Hudi logo in README

2022-04-15 Thread GitBox


yihua opened a new pull request, #5336:
URL: https://github.com/apache/hudi/pull/5336

   ## What is the purpose of the pull request
   
   This PR adds commit activity, twitter badgers, and Hudi logo in README.
   
   The medium-definition Hudi logo image is added to the Hudi site in #5331 .
   
   ## Verify this pull request
   
   Only README.md updates.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-3883) File-sizing issues when writing COW table to S3

2022-04-15 Thread Alexey Kudinkin (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Kudinkin updated HUDI-3883:
--
Fix Version/s: 0.12.0

> File-sizing issues when writing COW table to S3
> ---
>
> Key: HUDI-3883
> URL: https://issues.apache.org/jira/browse/HUDI-3883
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Alexey Kudinkin
>Assignee: Alexey Kudinkin
>Priority: Blocker
> Fix For: 0.12.0
>
> Attachments: Screen Shot 2022-04-14 at 1.08.19 PM.png
>
>
> Even after HUDI-3709, i still see that when writing partitioned-table 
> file-sizing doesn't seem to be properly respected: in that case i was running 
> ingestion job with following configs which was supposed to yield me ~100Mb 
> files
> {code:java}
> Map(
>   "hoodie.parquet.small.file.limit" -> String.valueOf(100 * 1024 * 1024),  // 
> 100Mb
>   "hoodie.parquet.max.file.size"-> String.valueOf(120 * 1024 * 1024)   // 
> 120Mb
> ) {code}
>  
> Instead, my table contains a lot of very small (~1Mb) files: 
> !Screen Shot 2022-04-14 at 1.08.19 PM.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] yihua merged pull request #5334: [MINOR] - updated external article list on Hudi docs

2022-04-15 Thread GitBox


yihua merged PR #5334:
URL: https://github.com/apache/hudi/pull/5334


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch asf-site updated: [DOCS] Updated external article list on Hudi docs (#5334)

2022-04-15 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new ca6752b8a1 [DOCS] Updated external article list on Hudi docs (#5334)
ca6752b8a1 is described below

commit ca6752b8a1b51a44916e813ded88c205645fc5e8
Author: Kyle Weller 
AuthorDate: Fri Apr 15 15:20:32 2022 -0700

[DOCS] Updated external article list on Hudi docs (#5334)
---
 website/src/pages/talks-articles.md | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/website/src/pages/talks-articles.md 
b/website/src/pages/talks-articles.md
index 13024b6d95..dff15f5bd6 100644
--- a/website/src/pages/talks-articles.md
+++ b/website/src/pages/talks-articles.md
@@ -94,6 +94,8 @@ Data Summit Connect, May, 2021
 
 39. ["Apache Hudi Meetup at Uber with talks from Philips, Moveworks & Uber 
(including Hudi OSS roadmap 2022)"](https://youtu.be/8Q0kM-emMyo) - By Felix 
Kizhakkel Jose (Philips), Bhavani Sudha (Moveworks), Prashant Wason (Uber), 
March 2022
 
+40. ["Apache Hudi with Vinoth 
Chandar"](https://softwareengineeringdaily.com/2022/03/08/apache-hudi-with-vinoth-chandar/)
 By Software Engineering Daily. Mar 5, 2022 
+
 ## Articles
 
 You can check out [our blog pages](https://hudi.apache.org/blog.html) for 
content written by our committers/contributors.
@@ -135,4 +137,18 @@ You can check out [our blog 
pages](https://hudi.apache.org/blog.html) for conten
 34. 
["https://www.xenonstack.com/insights/what-is-hudi;](https://www.xenonstack.com/insights/what-is-hudi)
 by Chandan Gaur. Nov 22, 2021
 35. 
["https://aws.amazon.com/blogs/big-data/new-features-from-apache-hudi-0-7-0-and-0-8-0-available-on-amazon-emr/;](https://aws.amazon.com/blogs/big-data/new-features-from-apache-hudi-0-7-0-and-0-8-0-available-on-amazon-emr/)
 by Udit Mehotra and Gagan Brahmi. Dec 20, 2021
 36. ["Designing the Analytics patterns using a Lake House approach on 
AWS"](https://dev.to/aws-builders/designing-the-analytics-patterns-using-a-lake-house-approach-on-aws-2hh6)
 by Adit Modi. Dec 30, 2021
-37. ["The Art of Building Open Data Lakes with Apache Hudi, Kafka, Hive, and 
Debezium"](https://garystafford.medium.com/the-art-of-building-open-data-lakes-with-apache-hudi-kafka-hive-and-debezium-3d2f71c5981f)
 by Gary Stafford. Dec 31, 2021
\ No newline at end of file
+37. ["The Art of Building Open Data Lakes with Apache Hudi, Kafka, Hive, and 
Debezium"](https://garystafford.medium.com/the-art-of-building-open-data-lakes-with-apache-hudi-kafka-hive-and-debezium-3d2f71c5981f)
 by Gary Stafford. Dec 31, 2021
+38. ["Why and How I Integrated Airbyte and Apache 
Hudi"](https://selectfrom.dev/why-and-how-i-integrated-airbyte-and-apache-hudi-c18aff3af21a)
 by Harsha Kanna. Jan 18, 2022
+39. ["Hudi powering data lake efforts at Walmart and Disney+ 
Hotstar"](https://www.techtarget.com/searchdatamanagement/feature/Hudi-powering-data-lake-efforts-at-Walmart-and-Disney-Hotstar)
 by Sean Kerner. Jan 20, 2022
+40. ["Cost Efficiency @ Scale in Big Data File 
Format"](https://eng.uber.com/cost-efficiency-big-data/) by Xinli Shang, Kai 
Jiang, Zheng Shao, and Mohammad Islam. Jan 25, 2022
+41. ["Onehouse Commitment to 
Openness"](https://www.onehouse.ai/blog/onehouse-commitment-to-openness) by 
Vinoth Chandar. Feb 2, 2022
+42. ["Onehouse brings a fully-managed lakehouse to Apache 
Hudi"](https://venturebeat.com/2022/02/03/onehouse-brings-a-fully-managed-lakehouse-to-apache-hudi/)
 by Paul Sawers. Feb 3, 2022
+43. ["ACID transformations on Distributed file 
system"](https://medium.com/walmartglobaltech/acid-transformations-on-distributed-file-system-fdec5301c1b1)
 by Rajasekhar. Feb 9, 2022
+44. ["Open Source Data Lake Table Formats: Evaluating Current Interest and 
Rate of 
Adoption"](https://garystafford.medium.com/data-lake-table-formats-interest-and-adoption-rate-40817b87be9e)
 by Gary Stafford. Feb 12, 2022
+45. ["Fresher Data Lake on AWS 
S3"](https://robinhood.engineering/author-balaji-varadarajan-e3f496815ebf) by 
Balaji Varadarajan. Feb 17, 2022
+46. ["Understanding its core concepts from hudi persistence 
files"](https://programmer.ink/think/understanding-its-core-concepts-from-hudi-persistence-files.html)
 by QbertsBrother. Feb 20, 2022
+47. ["Create a low-latency source-to-data lake pipeline using Amazon MSK 
Connect, Apache Flink, and Apache 
Hudi"](https://aws.amazon.com/blogs/big-data/create-a-low-latency-source-to-data-lake-pipeline-using-amazon-msk-connect-apache-flink-and-apache-hudi/)
 by Ali Alemi. Mar 1, 2022
+48. ["Build a serverless pipeline to analyze streaming data using AWS Glue, 
Apache Hudi, and Amazon 
S3"](https://aws.amazon.com/blogs/big-data/build-a-serverless-pipeline-to-analyze-streaming-data-using-aws-glue-apache-hudi-and-amazon-s3/)
 by Nikhil Khokhar and Dipta Bhattacharya. Mar 9, 2022
+49. ["Zendesk - Insights for CTOs: Part 3 – Growing 

[hudi] branch asf-site updated (d926276036 -> ab49d9bcd8)

2022-04-15 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a change to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


from d926276036 [MINOR] Fix docs build due to std-env (#5335)
 add ab49d9bcd8 GitHub Actions build asf-site

No new revisions were added by this update.

Summary of changes:
 content/404.html  | 4 ++--
 content/404/index.html| 4 ++--
 content/assets/js/{main.3d83d4a7.js => main.21fb549d.js}  | 4 ++--
 .../js/{main.3d83d4a7.js.LICENSE.txt => main.21fb549d.js.LICENSE.txt} | 0
 content/blog/2016/12/30/strata-talk-2017/index.html   | 4 ++--
 content/blog/2019/01/18/asf-incubation/index.html | 4 ++--
 content/blog/2019/03/07/batch-vs-incremental/index.html   | 4 ++--
 content/blog/2019/05/14/registering-dataset-to-hive/index.html| 4 ++--
 content/blog/2019/09/09/ingesting-database-changes/index.html | 4 ++--
 content/blog/2020/01/15/delete-support-in-hudi/index.html | 4 ++--
 content/blog/2020/01/20/change-capture-using-aws/index.html   | 4 ++--
 content/blog/2020/03/22/exporting-hudi-datasets/index.html| 4 ++--
 content/blog/2020/04/27/apache-hudi-apache-zepplin/index.html | 4 ++--
 .../blog/2020/05/28/monitoring-hudi-metrics-with-datadog/index.html   | 4 ++--
 .../2020/08/18/hudi-incremental-processing-on-data-lakes/index.html   | 4 ++--
 .../2020/08/20/efficient-migration-of-large-parquet-tables/index.html | 4 ++--
 content/blog/2020/08/21/async-compaction-deployment-model/index.html  | 4 ++--
 content/blog/2020/08/22/ingest-multiple-tables-using-hudi/index.html  | 4 ++--
 content/blog/2020/10/06/cdc-solution-using-hudi-by-nclouds/index.html | 4 ++--
 content/blog/2020/10/15/apache-hudi-meets-apache-flink/index.html | 4 ++--
 content/blog/2020/10/19/hudi-meets-aws-emr-and-aws-dms/index.html | 4 ++--
 content/blog/2020/11/11/hudi-indexing-mechanisms/index.html   | 4 ++--
 .../12/01/high-perf-data-lake-with-hudi-and-alluxio-t3go/index.html   | 4 ++--
 content/blog/2021/01/27/hudi-clustering-intro/index.html  | 4 ++--
 content/blog/2021/02/13/hudi-key-generators/index.html| 4 ++--
 content/blog/2021/03/01/hudi-file-sizing/index.html   | 4 ++--
 .../06/10/employing-right-configurations-for-hudi-cleaner/index.html  | 4 ++--
 content/blog/2021/07/21/streaming-data-lake-platform/index.html   | 4 ++--
 content/blog/2021/08/16/kafka-custom-deserializer/index.html  | 4 ++--
 content/blog/2021/08/18/improving-marker-mechanism/index.html | 4 ++--
 content/blog/2021/08/18/virtual-keys/index.html   | 4 ++--
 content/blog/2021/08/23/async-clustering/index.html   | 4 ++--
 content/blog/2021/08/23/s3-events-source/index.html   | 4 ++--
 .../01/building-eb-level-data-lake-using-hudi-at-bytedance/index.html | 4 ++--
 .../16/lakehouse-concurrency-control-are-we-too-optimistic/index.html | 4 ++--
 .../12/29/hudi-zorder-and-hilbert-space-filling-curves/index.html | 4 ++--
 content/blog/2022/01/06/apache-hudi-2021-a-year-in-review/index.html  | 4 ++--
 .../14/change-data-capture-with-debezium-and-apache-hudi/index.html   | 4 ++--
 content/blog/archive/index.html   | 4 ++--
 content/blog/index.html   | 4 ++--
 content/blog/page/2/index.html| 4 ++--
 content/blog/page/3/index.html| 4 ++--
 content/blog/streaming-data-lake-platform/index.html  | 4 ++--
 content/community/get-involved/index.html | 4 ++--
 content/community/syncs/index.html| 4 ++--
 content/community/team/index.html | 4 ++--
 content/contribute/developer-setup/index.html | 4 ++--
 content/contribute/how-to-contribute/index.html   | 4 ++--
 content/contribute/report-security-issues/index.html  | 4 ++--
 content/contribute/rfc-process/index.html | 4 ++--
 content/docs/0.10.0/azure_hoodie/index.html   | 4 ++--
 content/docs/0.10.0/bos_hoodie/index.html | 4 ++--
 content/docs/0.10.0/cli/index.html| 4 ++--
 content/docs/0.10.0/cloud/index.html  | 4 ++--
 content/docs/0.10.0/clustering/index.html | 4 ++--
 content/docs/0.10.0/compaction/index.html | 4 ++--
 content/docs/0.10.0/comparison/index.html | 4 ++--
 content/docs/0.10.0/concepts/index.html   | 4 ++--
 

[GitHub] [hudi] yihua merged pull request #5335: [MINOR] Fix docs build due to std-env

2022-04-15 Thread GitBox


yihua merged PR #5335:
URL: https://github.com/apache/hudi/pull/5335


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch asf-site updated (805b893a71 -> d926276036)

2022-04-15 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a change to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 805b893a71 GitHub Actions build asf-site
 add d926276036 [MINOR] Fix docs build due to std-env (#5335)

No new revisions were added by this update.

Summary of changes:
 website/package.json | 1 +
 1 file changed, 1 insertion(+)



[GitHub] [hudi] yihua commented on pull request #5335: [MINOR] Fix docs build due to std-env

2022-04-15 Thread GitBox


yihua commented on PR #5335:
URL: https://github.com/apache/hudi/pull/5335#issuecomment-1100400894

   cc @vingov @bhasudha 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua opened a new pull request, #5335: [MINOR] Fix docs build due to std-env

2022-04-15 Thread GitBox


yihua opened a new pull request, #5335:
URL: https://github.com/apache/hudi/pull/5335

   ## What is the purpose of the pull request
   
   This PR fixes the docs build due to the latest std-env 3.1.1 release.  
   
   ## Brief change log
   
 - Uses "std-env" module from 3.0.1 instead in package.json.
   
   ## Verify this pull request
   
   The website can successfully be built after the fix.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated: [MINOR] Fix typos in log4j-surefire.properties (#5212)

2022-04-15 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new b8e465fdfc [MINOR] Fix typos in log4j-surefire.properties (#5212)
b8e465fdfc is described below

commit b8e465fdfcac1961fe05ed44993c8c6139e13b31
Author: 董可伦 
AuthorDate: Sat Apr 16 04:33:37 2022 +0800

[MINOR] Fix typos in log4j-surefire.properties (#5212)
---
 .../hudi-client-common/src/test/resources/log4j-surefire.properties   | 4 ++--
 .../hudi-flink-client/src/main/resources/log4j-surefire.properties| 4 ++--
 .../hudi-flink-client/src/test/resources/log4j-surefire.properties| 4 ++--
 .../hudi-java-client/src/test/resources/log4j-surefire.properties | 4 ++--
 .../hudi-spark-client/src/test/resources/log4j-surefire.properties| 4 ++--
 hudi-common/src/test/resources/log4j-surefire.properties  | 4 ++--
 .../hudi-examples-flink/src/test/resources/log4j-surefire.properties  | 4 ++--
 .../hudi-examples-spark/src/test/resources/log4j-surefire.properties  | 4 ++--
 .../hudi-flink/src/test/resources/log4j-surefire.properties   | 4 ++--
 hudi-hadoop-mr/src/test/resources/log4j-surefire.properties   | 4 ++--
 hudi-integ-test/src/test/resources/log4j-surefire.properties  | 4 ++--
 hudi-kafka-connect/src/test/resources/log4j-surefire.properties   | 4 ++--
 .../hudi-spark/src/test/resources/log4j-surefire.properties   | 4 ++--
 .../hudi-spark2/src/test/resources/log4j-surefire.properties  | 4 ++--
 .../hudi-spark3/src/test/resources/log4j-surefire.properties  | 4 ++--
 .../hudi-datahub-sync/src/test/resources/log4j-surefire.properties| 4 ++--
 hudi-sync/hudi-dla-sync/src/test/resources/log4j-surefire.properties  | 4 ++--
 hudi-sync/hudi-hive-sync/src/test/resources/log4j-surefire.properties | 4 ++--
 .../hudi-sync-common/src/test/resources/log4j-surefire.properties | 4 ++--
 hudi-timeline-service/src/test/resources/log4j-surefire.properties| 4 ++--
 hudi-utilities/src/test/resources/log4j-surefire.properties   | 4 ++--
 21 files changed, 42 insertions(+), 42 deletions(-)

diff --git 
a/hudi-client/hudi-client-common/src/test/resources/log4j-surefire.properties 
b/hudi-client/hudi-client-common/src/test/resources/log4j-surefire.properties
index 32af462093..14bbb08972 100644
--- 
a/hudi-client/hudi-client-common/src/test/resources/log4j-surefire.properties
+++ 
b/hudi-client/hudi-client-common/src/test/resources/log4j-surefire.properties
@@ -20,9 +20,9 @@ log4j.logger.org.apache=INFO
 log4j.logger.org.apache.hudi=DEBUG
 log4j.logger.org.apache.hadoop.hbase=ERROR
 
-# A1 is set to be a ConsoleAppender.
+# CONSOLE is set to be a ConsoleAppender.
 log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender
-# A1 uses PatternLayout.
+# CONSOLE uses PatternLayout.
 log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout
 log4j.appender.CONSOLE.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n
 log4j.appender.CONSOLE.filter.a=org.apache.log4j.varia.LevelRangeFilter
diff --git 
a/hudi-client/hudi-flink-client/src/main/resources/log4j-surefire.properties 
b/hudi-client/hudi-flink-client/src/main/resources/log4j-surefire.properties
index 32af462093..14bbb08972 100644
--- a/hudi-client/hudi-flink-client/src/main/resources/log4j-surefire.properties
+++ b/hudi-client/hudi-flink-client/src/main/resources/log4j-surefire.properties
@@ -20,9 +20,9 @@ log4j.logger.org.apache=INFO
 log4j.logger.org.apache.hudi=DEBUG
 log4j.logger.org.apache.hadoop.hbase=ERROR
 
-# A1 is set to be a ConsoleAppender.
+# CONSOLE is set to be a ConsoleAppender.
 log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender
-# A1 uses PatternLayout.
+# CONSOLE uses PatternLayout.
 log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout
 log4j.appender.CONSOLE.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n
 log4j.appender.CONSOLE.filter.a=org.apache.log4j.varia.LevelRangeFilter
diff --git 
a/hudi-client/hudi-flink-client/src/test/resources/log4j-surefire.properties 
b/hudi-client/hudi-flink-client/src/test/resources/log4j-surefire.properties
index 32af462093..14bbb08972 100644
--- a/hudi-client/hudi-flink-client/src/test/resources/log4j-surefire.properties
+++ b/hudi-client/hudi-flink-client/src/test/resources/log4j-surefire.properties
@@ -20,9 +20,9 @@ log4j.logger.org.apache=INFO
 log4j.logger.org.apache.hudi=DEBUG
 log4j.logger.org.apache.hadoop.hbase=ERROR
 
-# A1 is set to be a ConsoleAppender.
+# CONSOLE is set to be a ConsoleAppender.
 log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender
-# A1 uses PatternLayout.
+# CONSOLE uses PatternLayout.
 log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout
 log4j.appender.CONSOLE.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n
 log4j.appender.CONSOLE.filter.a=org.apache.log4j.varia.LevelRangeFilter
diff --git 

[GitHub] [hudi] yihua merged pull request #5212: [MINOR] Fix typos in log4j-surefire.properties

2022-04-15 Thread GitBox


yihua merged PR #5212:
URL: https://github.com/apache/hudi/pull/5212


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #5064: [HUDI-3654] Initialize hudi metastore module.

2022-04-15 Thread GitBox


nsivabalan commented on PR #5064:
URL: https://github.com/apache/hudi/pull/5064#issuecomment-1100372635

   @xiarixiaoyao : can you review this when you get a chance. I have assigned 
it to myself as well. So, will try to review in a weeks time. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #5057: [HUDI-3651] optimize the hoodie hive client and ddl executor code wit…

2022-04-15 Thread GitBox


nsivabalan commented on PR #5057:
URL: https://github.com/apache/hudi/pull/5057#issuecomment-1100370034

   @wangxianghu : can you review the patch when you get a chance


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #5057: [HUDI-3651] optimize the hoodie hive client and ddl executor code wit…

2022-04-15 Thread GitBox


nsivabalan commented on PR #5057:
URL: https://github.com/apache/hudi/pull/5057#issuecomment-1100369298

   @JerryYue-M : can you rebase w/ latest master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #5071: [HUDI-1881]: draft implementation for trigger based on data availability

2022-04-15 Thread GitBox


nsivabalan commented on PR #5071:
URL: https://github.com/apache/hudi/pull/5071#issuecomment-1100367520

   @pratyakshsharma : once the patch is ready, do ping me here. I can review 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #5087: [HUDI-3614] [DO_NOT_MERGE]Replace List with HoodieData in HoodieFlink/JavaTable and commit executors

2022-04-15 Thread GitBox


nsivabalan commented on PR #5087:
URL: https://github.com/apache/hudi/pull/5087#issuecomment-1100350369

   @danny0405 : can you follow up on the patch when you get a chance. guess 
author is waiting for review follow up from you. a gentle reminder. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] kywe665 opened a new pull request, #5334: [MINOR] - updated external article list on Hudi docs

2022-04-15 Thread GitBox


kywe665 opened a new pull request, #5334:
URL: https://github.com/apache/hudi/pull/5334

   ## What is the purpose of the pull request
   
   updated the external articles for hudi docs
   
   ## Committer checklist
   
- [X] Has a corresponding JIRA in PR title & commit

- [X] Commit message is descriptive of the change

- [X] CI is green
   
- [X] Necessary doc changes done or have another open PR
  
- [X] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] bhasudha opened a new pull request, #5333: [DOCS] update broken links

2022-04-15 Thread GitBox


bhasudha opened a new pull request, #5333:
URL: https://github.com/apache/hudi/pull/5333

   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   update broken links across the website
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #5111: [HUDI-3695] Add a ORC reader in HoodieBaseRelation

2022-04-15 Thread GitBox


nsivabalan commented on PR #5111:
URL: https://github.com/apache/hudi/pull/5111#issuecomment-1100343028

   @alexeykudinkin : can you follow up on the review when you get a chance.
   @miomiocat : can you rebase w/ latest master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #5139: [WIP][HUDI-3579] Add timeline commands in hudi-cli

2022-04-15 Thread GitBox


nsivabalan commented on PR #5139:
URL: https://github.com/apache/hudi/pull/5139#issuecomment-1100337466

   @yihua : ping me once the patch is ready to be reviewed again


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #5177: [HUDI-3746][DO_NOT_MERGE] Test CI

2022-04-15 Thread GitBox


nsivabalan commented on PR #5177:
URL: https://github.com/apache/hudi/pull/5177#issuecomment-1100334046

   can we close this. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan closed pull request #5192: [WIP][DO_NOT_MERGE] Enable inline reading

2022-04-15 Thread GitBox


nsivabalan closed pull request #5192: [WIP][DO_NOT_MERGE] Enable inline reading
URL: https://github.com/apache/hudi/pull/5192


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-3779) Add docs regarding caveats for disabling and re-enabling MDT

2022-04-15 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-3779:

Status: In Progress  (was: Open)

> Add docs regarding caveats for disabling and re-enabling MDT
> 
>
> Key: HUDI-3779
> URL: https://issues.apache.org/jira/browse/HUDI-3779
> Project: Apache Hudi
>  Issue Type: Task
>  Components: docs
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> After disabling MDT, the user should make sure that MDT is completely deleted 
> after a few commits, before re-enabling MDT again.  The user should not flip 
> the flag off and on frequently.  Otherwise, there can be correctness issue.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3779) Add docs regarding caveats for disabling and re-enabling MDT

2022-04-15 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-3779:

Status: Patch Available  (was: In Progress)

> Add docs regarding caveats for disabling and re-enabling MDT
> 
>
> Key: HUDI-3779
> URL: https://issues.apache.org/jira/browse/HUDI-3779
> Project: Apache Hudi
>  Issue Type: Task
>  Components: docs
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> After disabling MDT, the user should make sure that MDT is completely deleted 
> after a few commits, before re-enabling MDT again.  The user should not flip 
> the flag off and on frequently.  Otherwise, there can be correctness issue.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3779) Add docs regarding caveats for disabling and re-enabling MDT

2022-04-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-3779:
-
Labels: pull-request-available  (was: )

> Add docs regarding caveats for disabling and re-enabling MDT
> 
>
> Key: HUDI-3779
> URL: https://issues.apache.org/jira/browse/HUDI-3779
> Project: Apache Hudi
>  Issue Type: Task
>  Components: docs
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> After disabling MDT, the user should make sure that MDT is completely deleted 
> after a few commits, before re-enabling MDT again.  The user should not flip 
> the flag off and on frequently.  Otherwise, there can be correctness issue.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] yihua opened a new pull request, #5332: [HUDI-3779] Update metadata table docs

2022-04-15 Thread GitBox


yihua opened a new pull request, #5332:
URL: https://github.com/apache/hudi/pull/5332

   ## What is the purpose of the pull request
   
   This PR updates metadata table docs with more detailed configurations and 
deployment considerations based on 0.11.0 release.
   
   ## Brief change log
   
 - Revised `metadata.md`
   
   ## Verify this pull request
   
   The website and the page can be built and visualized properly.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #5246: [HUDI-3813] [RFC-33] Schema Evolution Support DDL And DML Concurrency.

2022-04-15 Thread GitBox


nsivabalan commented on PR #5246:
URL: https://github.com/apache/hudi/pull/5246#issuecomment-1100299905

   @xushiyan : for now, I have assigned the PR to you. let me know if you can't 
take this up. I will find someone or I will take this up. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #5264: [HUDI-3818] encode bytes column value when generate HoodieKey

2022-04-15 Thread GitBox


nsivabalan commented on PR #5264:
URL: https://github.com/apache/hudi/pull/5264#issuecomment-1100297837

   generally record key, partition path and precombine should be comparable and 
so likely primitive types. wondering whats the use-case which demands byte[] to 
be chosen as a field for record key or partition path. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated: [HUDI-3835] Add UT for delete in java client (#5270)

2022-04-15 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 99dd1cb6e6 [HUDI-3835] Add UT for delete in java client (#5270)
99dd1cb6e6 is described below

commit 99dd1cb6e63600681aa11b3a03bc16d1401d8055
Author: 董可伦 
AuthorDate: Sat Apr 16 03:03:48 2022 +0800

[HUDI-3835] Add UT for delete in java client (#5270)
---
 .../commit/TestJavaCopyOnWriteActionExecutor.java  | 86 +-
 1 file changed, 85 insertions(+), 1 deletion(-)

diff --git 
a/hudi-client/hudi-java-client/src/test/java/org/apache/hudi/table/action/commit/TestJavaCopyOnWriteActionExecutor.java
 
b/hudi-client/hudi-java-client/src/test/java/org/apache/hudi/table/action/commit/TestJavaCopyOnWriteActionExecutor.java
index 1bf1b4cccb..518414d614 100644
--- 
a/hudi-client/hudi-java-client/src/test/java/org/apache/hudi/table/action/commit/TestJavaCopyOnWriteActionExecutor.java
+++ 
b/hudi-client/hudi-java-client/src/test/java/org/apache/hudi/table/action/commit/TestJavaCopyOnWriteActionExecutor.java
@@ -318,7 +318,7 @@ public class TestJavaCopyOnWriteActionExecutor extends 
HoodieJavaClientTestBase
   }
 
   @Test
-public void testInsertRecords() throws Exception {
+  public void testInsertRecords() throws Exception {
 HoodieWriteConfig config = makeHoodieClientConfig();
 String instantTime = makeNewCommitTime();
 metaClient = HoodieTableMetaClient.reload(metaClient);
@@ -465,6 +465,90 @@ public class TestJavaCopyOnWriteActionExecutor extends 
HoodieJavaClientTestBase
 verifyStatusResult(returnedStatuses, 
generateExpectedPartitionNumRecords(inputRecords));
   }
 
+  @Test
+  public void testDeleteRecords() throws Exception {
+// Prepare the AvroParquetIO
+HoodieWriteConfig config = makeHoodieClientConfig();
+int startInstant = 1;
+String firstCommitTime = makeNewCommitTime(startInstant++, "%09d");
+HoodieJavaWriteClient writeClient = getHoodieWriteClient(config);
+writeClient.startCommitWithTime(firstCommitTime);
+metaClient = HoodieTableMetaClient.reload(metaClient);
+BaseFileUtils fileUtils = BaseFileUtils.getInstance(metaClient);
+
+String partitionPath = "2022/04/09";
+
+// Get some records belong to the same partition (2016/01/31)
+String recordStr1 = 
"{\"_row_key\":\"8eb5b87a-1feh-4edd-87b4-6ec96dc405a0\","
++ "\"time\":\"2022-04-09T03:16:41.415Z\",\"number\":1}";
+String recordStr2 = 
"{\"_row_key\":\"8eb5b87b-1feu-4edd-87b4-6ec96dc405a0\","
++ "\"time\":\"2022-04-09T03:20:41.415Z\",\"number\":2}";
+String recordStr3 = 
"{\"_row_key\":\"8eb5b87c-1fej-4edd-87b4-6ec96dc405a0\","
++ "\"time\":\"2022-04-09T03:16:41.415Z\",\"number\":3}";
+
+List records = new ArrayList<>();
+RawTripTestPayload rowChange1 = new RawTripTestPayload(recordStr1);
+records.add(new HoodieAvroRecord(new HoodieKey(rowChange1.getRowKey(), 
rowChange1.getPartitionPath()), rowChange1));
+RawTripTestPayload rowChange2 = new RawTripTestPayload(recordStr2);
+records.add(new HoodieAvroRecord(new HoodieKey(rowChange2.getRowKey(), 
rowChange2.getPartitionPath()), rowChange2));
+RawTripTestPayload rowChange3 = new RawTripTestPayload(recordStr3);
+records.add(new HoodieAvroRecord(new HoodieKey(rowChange3.getRowKey(), 
rowChange3.getPartitionPath()), rowChange3));
+
+// Insert new records
+writeClient.insert(records, firstCommitTime);
+
+FileStatus[] allFiles = getIncrementalFiles(partitionPath, "0", -1);
+assertEquals(1, allFiles.length);
+
+// Read out the bloom filter and make sure filter can answer record exist 
or not
+Path filePath = allFiles[0].getPath();
+BloomFilter filter = fileUtils.readBloomFilterFromMetadata(hadoopConf, 
filePath);
+for (HoodieRecord record : records) {
+  assertTrue(filter.mightContain(record.getRecordKey()));
+}
+
+// Read the base file, check the record content
+List fileRecords = fileUtils.readAvroRecords(hadoopConf, 
filePath);
+int index = 0;
+for (GenericRecord record : fileRecords) {
+  assertEquals(records.get(index).getRecordKey(), 
record.get("_row_key").toString());
+  index++;
+}
+
+String newCommitTime = makeNewCommitTime(startInstant++, "%09d");
+writeClient.startCommitWithTime(newCommitTime);
+
+// Test delete two records
+List keysForDelete = new 
ArrayList(Arrays.asList(records.get(0).getKey(), records.get(2).getKey()));
+writeClient.delete(keysForDelete, newCommitTime);
+
+allFiles = getIncrementalFiles(partitionPath, "0", -1);
+assertEquals(1, allFiles.length);
+
+filePath = allFiles[0].getPath();
+// Read the base file, check the record content
+fileRecords = fileUtils.readAvroRecords(hadoopConf, filePath);
+// Check that the two records are deleted successfully
+

[GitHub] [hudi] nsivabalan merged pull request #5270: [HUDI-3835] Add UT for delete in java client

2022-04-15 Thread GitBox


nsivabalan merged PR #5270:
URL: https://github.com/apache/hudi/pull/5270


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #5292: [WIP] Upgrade to Hadoop 3.x Hive 3.x

2022-04-15 Thread GitBox


nsivabalan commented on PR #5292:
URL: https://github.com/apache/hudi/pull/5292#issuecomment-1100295279

   please prefix w/ right jira. I understand, its still WIP. but a gentle 
reminder. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #5319: [WIP] Adjusting `DeltaStreamer` shutdown sequence to avoid awaiting for 24h

2022-04-15 Thread GitBox


nsivabalan commented on PR #5319:
URL: https://github.com/apache/hudi/pull/5319#issuecomment-1100291962

   please create a jira and tag 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] alexeykudinkin commented on pull request #5329: [HUDI-3886] Adding default null for some of the fields in col stats in MDT schema

2022-04-15 Thread GitBox


alexeykudinkin commented on PR #5329:
URL: https://github.com/apache/hudi/pull/5329#issuecomment-1100291410

   @nsivabalan done


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated (57612c5c32 -> e8ab915aff)

2022-04-15 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 57612c5c32 [HUDI-3848] Fixing restore with cleaned up commits (#5288)
 add e8ab915aff [MINOR] Removing invalid code to close parquet reader 
iterator (#5182)

No new revisions were added by this update.

Summary of changes:
 .../src/main/scala/org/apache/hudi/HoodieBaseRelation.scala   | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)



[GitHub] [hudi] nsivabalan commented on pull request #5329: [HUDI-3886] Adding default null for some of the fields in col stats in MDT schema

2022-04-15 Thread GitBox


nsivabalan commented on PR #5329:
URL: https://github.com/apache/hudi/pull/5329#issuecomment-1100289976

   @alexeykudinkin : can you stamp this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan merged pull request #5182: [MINOR] Fixing parquet reader iterator close

2022-04-15 Thread GitBox


nsivabalan merged PR #5182:
URL: https://github.com/apache/hudi/pull/5182


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated (9e8664f4d2 -> 57612c5c32)

2022-04-15 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 9e8664f4d2 [HOTFIX] add missing license (#5322) (#5324)
 add 57612c5c32 [HUDI-3848] Fixing restore with cleaned up commits (#5288)

No new revisions were added by this update.

Summary of changes:
 .../rollback/ListingBasedRollbackStrategy.java | 10 ++-
 .../TestHoodieSparkMergeOnReadTableRollback.java   | 88 ++
 2 files changed, 97 insertions(+), 1 deletion(-)



[GitHub] [hudi] nsivabalan merged pull request #5288: [HUDI-3848] Fixing restore with cleaned up commits

2022-04-15 Thread GitBox


nsivabalan merged PR #5288:
URL: https://github.com/apache/hudi/pull/5288


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (HUDI-3749) Run latest hudi w/ EMR spark and report to aws folks

2022-04-15 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17522920#comment-17522920
 ] 

sivabalan narayanan commented on HUDI-3749:
---

Handing it off to [~uditme] to take it from here. 

[~xushiyan] : I will let Udit drive this since aws folks needs to upstream the 
changes they have internally to OSS anyways. 

> Run latest hudi w/ EMR spark and report to aws folks
> 
>
> Key: HUDI-3749
> URL: https://issues.apache.org/jira/browse/HUDI-3749
> Project: Apache Hudi
>  Issue Type: Task
>  Components: tests-ci
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Blocker
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3749) Try out 0.11 hudi w/ EMR spark

2022-04-15 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-3749:
--
Summary: Try out 0.11 hudi w/ EMR spark   (was: Run latest hudi w/ EMR 
spark and report to aws folks)

> Try out 0.11 hudi w/ EMR spark 
> ---
>
> Key: HUDI-3749
> URL: https://issues.apache.org/jira/browse/HUDI-3749
> Project: Apache Hudi
>  Issue Type: Task
>  Components: tests-ci
>Reporter: sivabalan narayanan
>Assignee: Udit Mehrotra
>Priority: Blocker
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HUDI-3749) Run latest hudi w/ EMR spark and report to aws folks

2022-04-15 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-3749:
-

Assignee: Udit Mehrotra  (was: sivabalan narayanan)

> Run latest hudi w/ EMR spark and report to aws folks
> 
>
> Key: HUDI-3749
> URL: https://issues.apache.org/jira/browse/HUDI-3749
> Project: Apache Hudi
>  Issue Type: Task
>  Components: tests-ci
>Reporter: sivabalan narayanan
>Assignee: Udit Mehrotra
>Priority: Blocker
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HUDI-3749) Run latest hudi w/ EMR spark and report to aws folks

2022-04-15 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17522919#comment-17522919
 ] 

sivabalan narayanan commented on HUDI-3749:
---

regular hive sync worked out of the box. 
{code:java}
df.write.format("hudi").
  option(PRECOMBINE_FIELD_OPT_KEY, "tpep_dropoff_datetime").
  option(RECORDKEY_FIELD_OPT_KEY, "tpep_pickup_datetime").
  option(PARTITIONPATH_FIELD_OPT_KEY, "date_col").
  option(TABLE_NAME, "hudi_tbl1").
  option("hoodie.embed.timeline.server","false").
  option("hoodie.datasource.hive_sync.enable","true").
  option("hoodie.datasource.hive_sync.database","default").
  option("hoodie.datasource.hive_sync.table","test_tbl3").
  option("hoodie.datasource.hive_sync.mode","hms").
  
option("hoodie.datasource.hive_sync.partition_fields","_hoodie_partition_path").
  mode(Overwrite).
  save(basePath)
 {code}
 

 

via beeline:
{code:java}
select * from test_tbl3 limit 5;{code}
{code:java}
++-+---++-+-+--++--+---+---+-+-+-++--++---+-+--+-+-+-+---+
| test_tbl3._hoodie_commit_time  | test_tbl3._hoodie_commit_seqno  | 
test_tbl3._hoodie_record_key  |            test_tbl3._hoodie_file_name          
   | test_tbl3.vendorid  | test_tbl3.tpep_pickup_datetime  | 
test_tbl3.tpep_dropoff_datetime  | test_tbl3.passenger_count  | 
test_tbl3.trip_distance  | test_tbl3.ratecodeid  | test_tbl3.store_and_fwd_flag 
 | test_tbl3.pulocationid  | test_tbl3.dolocationid  | test_tbl3.payment_type  
| test_tbl3.fare_amount  | test_tbl3.extra  | test_tbl3.mta_tax  | 
test_tbl3.tip_amount  | test_tbl3.tolls_amount  | 
test_tbl3.improvement_surcharge  | test_tbl3.total_amount  | 
test_tbl3.congestion_surcharge  | test_tbl3.date_col  | 
test_tbl3._hoodie_partition_path  |
++-+---++-+-+--++--+---+---+-+-+-++--++---+-+--+-+-+-+---+
| 20220415180627021              | 20220415180627021_7_1085992     | 2008-12-31 
23:02:59           | 
e78169d4-03a8-40e0-ad11-9ae43a52b565-0_7-155-6608_20220415180627021.parquet | 2 
                  | 2008-12-31 23:02:59             | 2009-01-01 18:22:41       
       | 1                          | 0.99                     | 1              
       | N                             | 249                     | 90           
           | 2                       | 7.0                    | 1.0             
 | 0.5                | 0.0                   | 0.0                     | 0.3   
                           | 11.3                    | 2.5                      
       | 2008-12-31          | 2008-12-31                        |
| 20220415180627021              | 20220415180627021_7_1085996     | 2008-12-31 
23:07:03           | 
e78169d4-03a8-40e0-ad11-9ae43a52b565-0_7-155-6608_20220415180627021.parquet | 2 
                  | 2008-12-31 23:07:03             | 2008-12-31 23:19:26       
       | 1                          | 1.39                     | 1              
       | N                             | 107                     | 162          
           | 2                       | 8.5                    | 0.0             
 | 0.5                | 0.0                   | 0.0                     | 0.3   
                           | 11.8                    | 2.5                      
       | 2008-12-31          | 2008-12-31                        |
| 20220415180627021              | 20220415180627021_7_1085998     | 2008-12-31 
23:43:51           | 
e78169d4-03a8-40e0-ad11-9ae43a52b565-0_7-155-6608_20220415180627021.parquet | 2 
                  | 2008-12-31 23:43:51             | 2009-01-01 10:32:34       
       | 1                          | 0.79                     | 1              
       | N                             | 

[jira] [Updated] (HUDI-3890) Fix apache rat check to detect all missing license

2022-04-15 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3890:
-
Priority: Critical  (was: Major)

> Fix apache rat check to detect all missing license
> --
>
> Key: HUDI-3890
> URL: https://issues.apache.org/jira/browse/HUDI-3890
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Raymond Xu
>Priority: Critical
>
> these 2 files which didn't have license were not reported
> ./hudi-utilities/src/test/resources/delta-streamer-config/schema_registry.source_schema_tab.sql
> ./hudi-utilities/src/test/resources/delta-streamer-config/schema_registry.target_schema_tab.sql



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3890) Fix apache rat check to detect all missing license

2022-04-15 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3890:
-
Fix Version/s: 0.12.0

> Fix apache rat check to detect all missing license
> --
>
> Key: HUDI-3890
> URL: https://issues.apache.org/jira/browse/HUDI-3890
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Raymond Xu
>Priority: Critical
> Fix For: 0.12.0
>
>
> these 2 files which didn't have license were not reported
> ./hudi-utilities/src/test/resources/delta-streamer-config/schema_registry.source_schema_tab.sql
> ./hudi-utilities/src/test/resources/delta-streamer-config/schema_registry.target_schema_tab.sql



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-3890) Fix apache rat check to detect all missing license

2022-04-15 Thread Raymond Xu (Jira)
Raymond Xu created HUDI-3890:


 Summary: Fix apache rat check to detect all missing license
 Key: HUDI-3890
 URL: https://issues.apache.org/jira/browse/HUDI-3890
 Project: Apache Hudi
  Issue Type: Task
Reporter: Raymond Xu


these 2 files which didn't have license were not reported

./hudi-utilities/src/test/resources/delta-streamer-config/schema_registry.source_schema_tab.sql
./hudi-utilities/src/test/resources/delta-streamer-config/schema_registry.target_schema_tab.sql



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] yihua opened a new pull request, #5331: [MINOR] Add a medium-definition Hudi logo

2022-04-15 Thread GitBox


yihua opened a new pull request, #5331:
URL: https://github.com/apache/hudi/pull/5331

   ## What is the purpose of the pull request
   
   As above.
   
   ## Brief change log
   
 - Adds `website/static/assets/images/hudi-logo-medium.png`.
   
   ## Verify this pull request
   
   The website is built locally and the new image can be accessed by 
`http://localhost:3000/assets/images/hudi-logo-medium.png`.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-3889) Do not validate table config if save mode is set to Overwrite

2022-04-15 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-3889:
-

 Summary: Do not validate table config if save mode is set to 
Overwrite
 Key: HUDI-3889
 URL: https://issues.apache.org/jira/browse/HUDI-3889
 Project: Apache Hudi
  Issue Type: Task
  Components: spark
Reporter: sivabalan narayanan


with spark datasource write, if Overwrite is set as save mode, we should not do 
table config validation 

 
{code:java}
scala> df.write.format("hudi").
     |   option(PRECOMBINE_FIELD_OPT_KEY, "tpep_dropoff_datetime").
     |   option(RECORDKEY_FIELD_OPT_KEY, "tpep_pickup_datetime").
     |   option(PARTITIONPATH_FIELD_OPT_KEY, "date_col").
     |   option(TABLE_NAME, "hudi_tbl1").
     |   option("hoodie.embed.timeline.server","false").
     |   mode(Overwrite).
     |   save(basePath)
warning: one deprecation; for details, enable `:setting -deprecation' or 
`:replay -deprecation'
org.apache.hudi.exception.HoodieException: Config conflict(key  current value   
existing value):
RecordKey:  tpep_pickup_datetimeid
PreCombineKey:  tpep_dropoff_datetime   created_at
  at 
org.apache.hudi.HoodieWriterUtils$.validateTableConfig(HoodieWriterUtils.scala:161)
  at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:87)
  at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:161)
  at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
 {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3889) Do not validate table config if save mode is set to Overwrite

2022-04-15 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-3889:
--
Priority: Critical  (was: Major)

> Do not validate table config if save mode is set to Overwrite
> -
>
> Key: HUDI-3889
> URL: https://issues.apache.org/jira/browse/HUDI-3889
> Project: Apache Hudi
>  Issue Type: Task
>  Components: spark
>Reporter: sivabalan narayanan
>Priority: Critical
>
> with spark datasource write, if Overwrite is set as save mode, we should not 
> do table config validation 
>  
> {code:java}
> scala> df.write.format("hudi").
>      |   option(PRECOMBINE_FIELD_OPT_KEY, "tpep_dropoff_datetime").
>      |   option(RECORDKEY_FIELD_OPT_KEY, "tpep_pickup_datetime").
>      |   option(PARTITIONPATH_FIELD_OPT_KEY, "date_col").
>      |   option(TABLE_NAME, "hudi_tbl1").
>      |   option("hoodie.embed.timeline.server","false").
>      |   mode(Overwrite).
>      |   save(basePath)
> warning: one deprecation; for details, enable `:setting -deprecation' or 
> `:replay -deprecation'
> org.apache.hudi.exception.HoodieException: Config conflict(keycurrent 
> value   existing value):
> RecordKey:tpep_pickup_datetimeid
> PreCombineKey:tpep_dropoff_datetime   created_at
>   at 
> org.apache.hudi.HoodieWriterUtils$.validateTableConfig(HoodieWriterUtils.scala:161)
>   at 
> org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:87)
>   at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:161)
>   at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3889) Do not validate table config if save mode is set to Overwrite

2022-04-15 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-3889:
--
Fix Version/s: 0.12.0

> Do not validate table config if save mode is set to Overwrite
> -
>
> Key: HUDI-3889
> URL: https://issues.apache.org/jira/browse/HUDI-3889
> Project: Apache Hudi
>  Issue Type: Task
>  Components: spark
>Reporter: sivabalan narayanan
>Priority: Critical
> Fix For: 0.12.0
>
>
> with spark datasource write, if Overwrite is set as save mode, we should not 
> do table config validation 
>  
> {code:java}
> scala> df.write.format("hudi").
>      |   option(PRECOMBINE_FIELD_OPT_KEY, "tpep_dropoff_datetime").
>      |   option(RECORDKEY_FIELD_OPT_KEY, "tpep_pickup_datetime").
>      |   option(PARTITIONPATH_FIELD_OPT_KEY, "date_col").
>      |   option(TABLE_NAME, "hudi_tbl1").
>      |   option("hoodie.embed.timeline.server","false").
>      |   mode(Overwrite).
>      |   save(basePath)
> warning: one deprecation; for details, enable `:setting -deprecation' or 
> `:replay -deprecation'
> org.apache.hudi.exception.HoodieException: Config conflict(keycurrent 
> value   existing value):
> RecordKey:tpep_pickup_datetimeid
> PreCombineKey:tpep_dropoff_datetime   created_at
>   at 
> org.apache.hudi.HoodieWriterUtils$.validateTableConfig(HoodieWriterUtils.scala:161)
>   at 
> org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:87)
>   at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:161)
>   at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot commented on pull request #5328: [WIP] Fix Bulk Insert to repartition the dataset based on Partition Path

2022-04-15 Thread GitBox


hudi-bot commented on PR #5328:
URL: https://github.com/apache/hudi/pull/5328#issuecomment-1100234074

   
   ## CI report:
   
   * 6812e0065e1411107d7d53ad2997d02e7ce34d06 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8079)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #5328: [WIP] Fix Bulk Insert to repartition the dataset based on Partition Path

2022-04-15 Thread GitBox


nsivabalan commented on PR #5328:
URL: https://github.com/apache/hudi/pull/5328#issuecomment-1100196310

   high level comment. I would prefer to introduce a new sort mode instead of 
fixing NONE. and add documentation around when to use which sort mode so that 
users are aware of diff sort modes and their implications


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5328: [WIP] Fix Bulk Insert to repartition the dataset based on Partition Path

2022-04-15 Thread GitBox


hudi-bot commented on PR #5328:
URL: https://github.com/apache/hudi/pull/5328#issuecomment-1100192710

   
   ## CI report:
   
   * 96b33942edf6a1d6d89361d2e056ed1c3a8d326b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8077)
 
   * 6812e0065e1411107d7d53ad2997d02e7ce34d06 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8079)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5328: [WIP] Fix Bulk Insert to repartition the dataset based on Partition Path

2022-04-15 Thread GitBox


hudi-bot commented on PR #5328:
URL: https://github.com/apache/hudi/pull/5328#issuecomment-1100190821

   
   ## CI report:
   
   * 96b33942edf6a1d6d89361d2e056ed1c3a8d326b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8077)
 
   * 6812e0065e1411107d7d53ad2997d02e7ce34d06 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-3826) Make truncate partition use delete_partition operation

2022-04-15 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3826:
-
Reviewers: Alexey Kudinkin, Raymond Xu, sivabalan narayanan  (was: Alexey 
Kudinkin, sivabalan narayanan)

> Make truncate partition use delete_partition operation
> --
>
> Key: HUDI-3826
> URL: https://issues.apache.org/jira/browse/HUDI-3826
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Alexey Kudinkin
>Assignee: Forward Xu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> Currently, `TruncateHoodieTableCommand` as well as 
> `AlterHoodieTableDropPartitionCommand` deletes partitions from Hudi table by 
> simply removing corresponding partition folders w/o committing any changes 
> (and correspondingly updating the MT for ex) 
> Instead it should go t/h WriteClient's `deletePartitions` API, similar to 
> Spark DS does when gets Hudi's DELETE command
> You can see that when enable Column Stats Index by default and running our CI 
> (Setting "hoodie.metadata.index.column.stats.enable"
> and "hoodie.metadata.enable" to true)
> https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=7926=logs=dcedfe73-9485-5cc5-817a-73b61fc5dcb0=746585d8-b50a-55c3-26c5-517d93af9934



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-3888) Triage drop partition col with CI

2022-04-15 Thread Raymond Xu (Jira)
Raymond Xu created HUDI-3888:


 Summary: Triage drop partition col with CI
 Key: HUDI-3888
 URL: https://issues.apache.org/jira/browse/HUDI-3888
 Project: Apache Hudi
  Issue Type: Task
  Components: tests-ci
Reporter: Raymond Xu
Assignee: Ethan Guo
 Fix For: 0.11.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3888) Triage drop partition col with CI

2022-04-15 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3888:
-
Sprint: Hudi-Sprint-Apr-12

> Triage drop partition col with CI
> -
>
> Key: HUDI-3888
> URL: https://issues.apache.org/jira/browse/HUDI-3888
> Project: Apache Hudi
>  Issue Type: Task
>  Components: tests-ci
>Reporter: Raymond Xu
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3888) Triage drop partition col with CI

2022-04-15 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3888:
-
Status: In Progress  (was: Open)

> Triage drop partition col with CI
> -
>
> Key: HUDI-3888
> URL: https://issues.apache.org/jira/browse/HUDI-3888
> Project: Apache Hudi
>  Issue Type: Task
>  Components: tests-ci
>Reporter: Raymond Xu
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HUDI-3707) Fix deltastreamer test with schema provider and transformer enabled

2022-04-15 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-3707:
-

Assignee: sivabalan narayanan  (was: Sagar Sumit)

> Fix deltastreamer test with schema provider and transformer enabled
> ---
>
> Key: HUDI-3707
> URL: https://issues.apache.org/jira/browse/HUDI-3707
> Project: Apache Hudi
>  Issue Type: Test
>  Components: tests-ci
>Reporter: Raymond Xu
>Assignee: sivabalan narayanan
>Priority: Blocker
> Fix For: 0.11.0, 0.12.0
>
>
> Fix cases like this
> @Disabled("To investigate problem with schema provider and transformer")
> in org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3707) Fix deltastreamer test with schema provider and transformer enabled

2022-04-15 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-3707:
--
Status: In Progress  (was: Open)

> Fix deltastreamer test with schema provider and transformer enabled
> ---
>
> Key: HUDI-3707
> URL: https://issues.apache.org/jira/browse/HUDI-3707
> Project: Apache Hudi
>  Issue Type: Test
>  Components: tests-ci
>Reporter: Raymond Xu
>Assignee: sivabalan narayanan
>Priority: Blocker
> Fix For: 0.11.0, 0.12.0
>
>
> Fix cases like this
> @Disabled("To investigate problem with schema provider and transformer")
> in org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (HUDI-3867) Disable Data Skipping by default in 0.11

2022-04-15 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu closed HUDI-3867.

Resolution: Fixed

> Disable Data Skipping by default in 0.11
> 
>
> Key: HUDI-3867
> URL: https://issues.apache.org/jira/browse/HUDI-3867
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Alexey Kudinkin
>Assignee: Alexey Kudinkin
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> Since it nor relies on MT's Column Stats Index which is off by default in 0.11
>  
> We should re-enable it right after the release.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] Guanpx commented on issue #5330: [SUPPORT] [BUG] Duplicate fileID ??? from bucket ?? of partition found during the BucketStreamWriteFunction index bootstrap.

2022-04-15 Thread GitBox


Guanpx commented on issue #5330:
URL: https://github.com/apache/hudi/issues/5330#issuecomment-1100019832

   cc @danny0405 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] kasured commented on issue #5298: [SUPPORT] File is deleted during inline compaction on MOR table causing subsequent FileNotFoundException on a reader

2022-04-15 Thread GitBox


kasured commented on issue #5298:
URL: https://github.com/apache/hudi/issues/5298#issuecomment-104068

   Upon further investigation and after enabling additional logs on EMR, the 
deletion of the file during compaction is happening in the class 
org.apache.hudi.table.HoodieTable#reconcileAgainstMarkers
   
   ```
   if (!invalidDataPaths.isEmpty()) {
   LOG.info("Removing duplicate data files created due to spark retries 
before committing. Paths=" + invalidDataPaths);`
   ```
   
   However, later in the logs this file is written and commited in the instant 
   ```
   INFO SparkRDDWriteClient: Committing Compaction 20220414232316. Finished 
with result 
HoodieCommitMetadata{partitionToWriteStats={cluster=96/shard=14377=[HoodieWriteStat{fileId='9d9f72e9-9381-40d0-af0c-cb48c25bd78d-0',
 
path='cluster=96/shard=14377/9d9f72e9-9381-40d0-af0c-cb48c25bd78d-0_0-617-7132_20220414232316.parquet',
 prevCommit='20220414225217', numWrites=122886, numDeletes=0, 
numUpdateWrites=121939, totalWriteBytes=23331178, totalWriteErrors=0, 
tempPath='null', partitionPath='cluster=96/shard=14377', 
totalLogRecords=341027, totalLogFilesCompacted=3, 
totalLogSizeCompacted=285373803, totalUpdatedRecordsCompacted=121939, 
totalLogBlocks=9, totalCorruptLogBlock=0, totalRollbackBlocks=0}]}, 
compacted=true,
   ```
   So it leaves the system in an inconsistent state. It looks like some 
concurrency issues to me
   
   I will try to submit multiple StreamingQuery in different threads by 
leveraging spark scheduling pool. Will update about the status


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] Guanpx opened a new issue, #5330: [SUPPORT] [BUG] Duplicate fileID ??? from bucket ?? of partition found during the BucketStreamWriteFunction index bootstrap.

2022-04-15 Thread GitBox


Guanpx opened a new issue, #5330:
URL: https://github.com/apache/hudi/issues/5330

   **Describe the problem you faced**
   
   use flink1.13 ,bucket index , cow ,hudi-0.11.0(not latest) 
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. start flink job
   2. cancel flink job
   3. repeat 1-2 some times
   4. start job,then that Exception was occured 
   
   
   **Environment Description**
   
   * Hudi version : 0.11.0
   
   * Flink version : 1.13.2
   
   * Hadoop version : 3.0.0
   
   * Storage (HDFS/S3/GCS..) :HDFS
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**

   
![image](https://user-images.githubusercontent.com/29246713/163552259-4e5f0215-e696-4b2a-a11c-4b555a2aa220.png)
   
   
   **Stacktrace**
   
   ```
   java.lang.RuntimeException: Duplicate fileID 
0007----40bee2bd5a70 from bucket 7 of partition  found during 
the BucketStreamWriteFunction index bootstrap.
at 
org.apache.hudi.sink.bucket.BucketStreamWriteFunction.lambda$bootstrapIndexIfNeed$1(BucketStreamWriteFunction.java:179)
at 
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at 
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at 
java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at 
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at 
java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at 
org.apache.hudi.sink.bucket.BucketStreamWriteFunction.bootstrapIndexIfNeed(BucketStreamWriteFunction.java:173)
at 
org.apache.hudi.sink.bucket.BucketStreamWriteFunction.processElement(BucketStreamWriteFunction.java:123)
at 
org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)
at 
org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:205)
at 
org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:134)
at 
org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:105)
at 
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:66)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:423)
at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:204)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:681)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(StreamTask.java:636)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:647)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:620)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:779)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
at java.lang.Thread.run(Thread.java:748)
   
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] XuQianJin-Stars commented on issue #5327: [SUPPORT]Mor table hive synchronization supports more flexible configuration

2022-04-15 Thread GitBox


XuQianJin-Stars commented on issue #5327:
URL: https://github.com/apache/hudi/issues/5327#issuecomment-1099967428

   > > Here we need to add some configuration of synchronization rules.
   > 
   > Is there some solution design for synchronization rules now? In addition 
to the two points mentioned above, are there other optimizations? Because the 
above two points have been optimized in our practice, I don't know if we can 
contribute.
   
   Well, can contribute.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   >