[jira] [Resolved] (HUDI-1173) fix hudi-prometheus pom dependency

2020-08-10 Thread liujinhui (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liujinhui resolved HUDI-1173.
-
  Assignee: liujinhui
Resolution: Fixed

Has been merged into master

> fix hudi-prometheus pom dependency
> --
>
> Key: HUDI-1173
> URL: https://issues.apache.org/jira/browse/HUDI-1173
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: liujinhui
>Assignee: liujinhui
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-1173) fix hudi-prometheus pom dependency

2020-08-10 Thread liujinhui (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liujinhui closed HUDI-1173.
---

Has been merged into master

> fix hudi-prometheus pom dependency
> --
>
> Key: HUDI-1173
> URL: https://issues.apache.org/jira/browse/HUDI-1173
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: liujinhui
>Assignee: liujinhui
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1173) fix hudi-prometheus pom dependency

2020-08-10 Thread liujinhui (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liujinhui updated HUDI-1173:

Status: In Progress  (was: Open)

> fix hudi-prometheus pom dependency
> --
>
> Key: HUDI-1173
> URL: https://issues.apache.org/jira/browse/HUDI-1173
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: liujinhui
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (HUDI-1173) fix hudi-prometheus pom dependency

2020-08-10 Thread liujinhui (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liujinhui updated HUDI-1173:

Comment: was deleted

(was: Has been merged into master)

> fix hudi-prometheus pom dependency
> --
>
> Key: HUDI-1173
> URL: https://issues.apache.org/jira/browse/HUDI-1173
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: liujinhui
>Assignee: liujinhui
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1173) fix hudi-prometheus pom dependency

2020-08-10 Thread liujinhui (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liujinhui updated HUDI-1173:

Status: Open  (was: New)

> fix hudi-prometheus pom dependency
> --
>
> Key: HUDI-1173
> URL: https://issues.apache.org/jira/browse/HUDI-1173
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: liujinhui
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] yanghua commented on pull request #1871: [HUDI-781] Introduce HoodieTestTable for test preparation

2020-08-10 Thread GitBox


yanghua commented on pull request #1871:
URL: https://github.com/apache/hudi/pull/1871#issuecomment-671742888


   > @xushiyan @yanghua : This PR is causing lot of merge conflicts to a 
blocker PR which we needed to merge by tonight and I am unable to resolve 
conflicts in time. I am reverting this PR for now. Can you kindly re-merge this 
PR once 0.6. is cut.
   > 
   > THanks,
   > Balaji.V
   
   Ok, sorry for interrupting the release plan.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] RajasekarSribalan commented on issue #1939: [SUPPORT] Hudi creating parquet with huge size and not in sink with limitFileSize

2020-08-10 Thread GitBox


RajasekarSribalan commented on issue #1939:
URL: https://github.com/apache/hudi/issues/1939#issuecomment-671742308


   Yes @bvaradar we do an initial bulk insert and then upsert for subsequent 
operations.! I configured hoodie.copyonwrite.record.size.estimate to 128 while 
taking initial load via bulk insert. But during subsequent upserts, we face 
memory issues as stated above and  streaming jobs are getting failed... But we 
are sure the size of 10mill records is close to 10GB and we have given 
sufficient executor memory(60GB per executor and 4 cores)..
   
   We use Dstream and number of records for each micro batch is 10 mil and size 
of the batch is 10GB.
   
   We persist the RDD(10GB) in disk because we reuse RDD for upsert and 
subsequent deletes. What i can see from storage tab in spark is, Hudi do 
persist the data internally in memory. I tried configuring 
hoodie.write.status.storage.level to Disk to leave more memory for tasks.. But 
Hudi always persists in memory? Any thoughts on this prop? will this be a 
reason for memory isue?
   
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on a change in pull request #1834: [HUDI-1013] Adding Bulk Insert V2 implementation

2020-08-10 Thread GitBox


vinothchandar commented on a change in pull request #1834:
URL: https://github.com/apache/hudi/pull/1834#discussion_r468332027



##
File path: hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##
@@ -108,262 +106,280 @@ private[hudi] object HoodieSparkSqlWriter {
   throw new HoodieException(s"hoodie table with name 
$existingTableName already exist at $basePath")
 }
   }
-  val (writeStatuses, writeClient: 
HoodieWriteClient[HoodieRecordPayload[Nothing]]) =
-if (!operation.equalsIgnoreCase(DELETE_OPERATION_OPT_VAL)) {
-  // register classes & schemas
-  val structName = s"${tblName}_record"
-  val nameSpace = s"hoodie.${tblName}"
-  sparkContext.getConf.registerKryoClasses(
-Array(classOf[org.apache.avro.generic.GenericData],
-  classOf[org.apache.avro.Schema]))
-  val schema = 
AvroConversionUtils.convertStructTypeToAvroSchema(df.schema, structName, 
nameSpace)
-  sparkContext.getConf.registerAvroSchemas(schema)
-  log.info(s"Registered avro schema : ${schema.toString(true)}")
-
-  // Convert to RDD[HoodieRecord]
-  val keyGenerator = 
DataSourceUtils.createKeyGenerator(toProperties(parameters))
-  val genericRecords: RDD[GenericRecord] = 
AvroConversionUtils.createRdd(df, structName, nameSpace)
-  val hoodieAllIncomingRecords = genericRecords.map(gr => {
-val orderingVal = HoodieAvroUtils.getNestedFieldVal(gr, 
parameters(PRECOMBINE_FIELD_OPT_KEY), false)
-  .asInstanceOf[Comparable[_]]
-DataSourceUtils.createHoodieRecord(gr,
-  orderingVal, keyGenerator.getKey(gr),
-  parameters(PAYLOAD_CLASS_OPT_KEY))
-  }).toJavaRDD()
-
-  // Handle various save modes
-  if (mode == SaveMode.ErrorIfExists && exists) {
-throw new HoodieException(s"hoodie table at $basePath already 
exists.")
-  }
 
-  if (mode == SaveMode.Overwrite && exists) {
-log.warn(s"hoodie table at $basePath already exists. Deleting 
existing data & overwriting with new data.")
-fs.delete(basePath, true)
-exists = false
-  }
+  val (writeSuccessfulRetVal: Boolean, commitTimeRetVal: 
common.util.Option[String], compactionInstantRetVal: common.util.Option[String],

Review comment:
   this whole block is not indented at the right level. I am going to try 
and apply changes from this file line-by-line onto latest file on master





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] bvaradar commented on pull request #1871: [HUDI-781] Introduce HoodieTestTable for test preparation

2020-08-10 Thread GitBox


bvaradar commented on pull request #1871:
URL: https://github.com/apache/hudi/pull/1871#issuecomment-671730869


   Thanks @xushiyan 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zhedoubushishi commented on pull request #1870: [HUDI-808] Support cleaning bootstrap source data

2020-08-10 Thread GitBox


zhedoubushishi commented on pull request #1870:
URL: https://github.com/apache/hudi/pull/1870#issuecomment-671730790


   LGTM to me. Thanks for the implementation of versioning part @bvaradar !
   Only left some minor comments. I noticed that there's a conflict with 
another commit but it seems you just reverted that commit.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xushiyan commented on pull request #1871: [HUDI-781] Introduce HoodieTestTable for test preparation

2020-08-10 Thread GitBox


xushiyan commented on pull request #1871:
URL: https://github.com/apache/hudi/pull/1871#issuecomment-671730074


   @bvaradar no worries.. i can do another one.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated: Revert "[HUDI-781] Introduce HoodieTestTable for test preparation (#1871)"

2020-08-10 Thread vbalaji
This is an automated email from the ASF dual-hosted git repository.

vbalaji pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 626f78f  Revert "[HUDI-781] Introduce HoodieTestTable for test 
preparation (#1871)"
626f78f is described below

commit 626f78f6f639cae2d3d57d29e7ef0642cb0be7ee
Author: Balaji Varadarajan 
AuthorDate: Mon Aug 10 22:13:02 2020 -0700

Revert "[HUDI-781] Introduce HoodieTestTable for test preparation (#1871)"

This reverts commit b2e703d4427abca02b053facd5058aa256ef.
---
 .../org/apache/hudi/io/HoodieAppendHandle.java |   1 -
 .../org/apache/hudi/io/HoodieCreateHandle.java |   1 -
 .../java/org/apache/hudi/io/HoodieMergeHandle.java |   1 -
 .../java/org/apache/hudi/io/HoodieWriteHandle.java |   3 +-
 .../src/main/java/org/apache/hudi/io}/IOType.java  |  15 +-
 .../java/org/apache/hudi/table/MarkerFiles.java|  15 +-
 .../rollback/MarkerBasedRollbackStrategy.java  |   8 +-
 .../table/upgrade/ZeroToOneUpgradeHandler.java |   2 +-
 .../TestHoodieClientOnCopyOnWriteStorage.java  |   2 +-
 .../java/org/apache/hudi/table/TestCleaner.java| 393 +
 .../apache/hudi/table/TestConsistencyGuard.java|  28 +-
 .../org/apache/hudi/table/TestMarkerFiles.java |  10 +-
 .../table/action/commit/TestUpsertPartitioner.java |   8 +-
 .../table/action/compact/TestHoodieCompactor.java  |   7 +-
 .../rollback/TestMarkerBasedRollbackStrategy.java  |  69 ++--
 .../hudi/testutils/HoodieClientTestUtils.java  |  99 --
 .../hudi/common/testutils/FileCreateUtils.java | 113 --
 .../hudi/common/testutils/HoodieTestTable.java | 232 
 .../hudi/common/testutils/HoodieTestUtils.java | 102 +++---
 19 files changed, 467 insertions(+), 642 deletions(-)

diff --git 
a/hudi-client/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java 
b/hudi-client/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java
index 7996a77..7a8e5ab 100644
--- a/hudi-client/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java
+++ b/hudi-client/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java
@@ -32,7 +32,6 @@ import org.apache.hudi.common.model.HoodieRecordLocation;
 import org.apache.hudi.common.model.HoodieRecordPayload;
 import org.apache.hudi.common.model.HoodieWriteStat;
 import org.apache.hudi.common.model.HoodieWriteStat.RuntimeStats;
-import org.apache.hudi.common.model.IOType;
 import org.apache.hudi.common.table.log.HoodieLogFormat;
 import org.apache.hudi.common.table.log.HoodieLogFormat.Writer;
 import org.apache.hudi.common.table.log.block.HoodieDataBlock;
diff --git 
a/hudi-client/src/main/java/org/apache/hudi/io/HoodieCreateHandle.java 
b/hudi-client/src/main/java/org/apache/hudi/io/HoodieCreateHandle.java
index 5a76dc7..705e98d 100644
--- a/hudi-client/src/main/java/org/apache/hudi/io/HoodieCreateHandle.java
+++ b/hudi-client/src/main/java/org/apache/hudi/io/HoodieCreateHandle.java
@@ -28,7 +28,6 @@ import org.apache.hudi.common.model.HoodieRecordLocation;
 import org.apache.hudi.common.model.HoodieRecordPayload;
 import org.apache.hudi.common.model.HoodieWriteStat;
 import org.apache.hudi.common.model.HoodieWriteStat.RuntimeStats;
-import org.apache.hudi.common.model.IOType;
 import org.apache.hudi.common.util.Option;
 import org.apache.hudi.common.util.collection.Pair;
 import org.apache.hudi.config.HoodieWriteConfig;
diff --git 
a/hudi-client/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java 
b/hudi-client/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
index 8d54065..f0ea284 100644
--- a/hudi-client/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
+++ b/hudi-client/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
@@ -29,7 +29,6 @@ import org.apache.hudi.common.model.HoodieRecordLocation;
 import org.apache.hudi.common.model.HoodieRecordPayload;
 import org.apache.hudi.common.model.HoodieWriteStat;
 import org.apache.hudi.common.model.HoodieWriteStat.RuntimeStats;
-import org.apache.hudi.common.model.IOType;
 import org.apache.hudi.common.util.DefaultSizeEstimator;
 import org.apache.hudi.common.util.HoodieRecordSizeEstimator;
 import org.apache.hudi.common.util.Option;
diff --git 
a/hudi-client/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java 
b/hudi-client/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java
index 5ea8c38..d148b1b 100644
--- a/hudi-client/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java
+++ b/hudi-client/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java
@@ -24,7 +24,6 @@ import org.apache.hudi.client.WriteStatus;
 import org.apache.hudi.common.fs.FSUtils;
 import org.apache.hudi.common.model.HoodieRecord;
 import org.apache.hudi.common.model.HoodieRecordPayload;
-import org.apache.hudi.common.model.IOType;
 import org.apache.hudi.common.util.HoodieTimer;
 import org.apache.hudi.common.util.Option;
 import org.apache.hudi.c

[GitHub] [hudi] bvaradar commented on pull request #1871: [HUDI-781] Introduce HoodieTestTable for test preparation

2020-08-10 Thread GitBox


bvaradar commented on pull request #1871:
URL: https://github.com/apache/hudi/pull/1871#issuecomment-671729188


   @xushiyan @yanghua : This PR is causing lot of merge conflicts to a blocker 
PR which we needed to merge  by tonight and I am unable to resolve conflicts in 
time. I am reverting this PR for now. Can you kindly re-merge this PR once 0.6. 
is cut. 
   
   THanks,
   Balaji.V



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xushiyan commented on pull request #1871: [HUDI-781] Introduce HoodieTestTable for test preparation

2020-08-10 Thread GitBox


xushiyan commented on pull request #1871:
URL: https://github.com/apache/hudi/pull/1871#issuecomment-671721410


   @vinothchandar sorry i'll make the PRs in draft state until the cut.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] harishchanderramesh edited a comment on issue #1936: Hudi Query Error

2020-08-10 Thread GitBox


harishchanderramesh edited a comment on issue #1936:
URL: https://github.com/apache/hudi/issues/1936#issuecomment-671719305


   Hi @umehrot2 ,
   
   Please find me responses below.
   
   Are you able to do a simple aws s3 ls and list or get anything from your 
cluster on S3 ?
   **_Yes,I am able to._** 
   Are you configuring to use S3A instead of EmrFS as the filesystem or your 
EMR cluster ?
   **_No, I am not configuring any file system explicitly_**
   Are you running and EMR bootstrap actions that change sdk/http client 
versions on the cluster ?
   **_No, I am not._**
   Are you changing spark driver/executor classpaths ?
   **_No, I am not changing the classpaths._**
   Is this happening specifically for hudi tables ? Or for non-hudi tables as 
well ?
   **_Only for Hudi tables. The other Delta IO tables are working fine._**



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] harishchanderramesh edited a comment on issue #1936: Hudi Query Error

2020-08-10 Thread GitBox


harishchanderramesh edited a comment on issue #1936:
URL: https://github.com/apache/hudi/issues/1936#issuecomment-671719305


   Hi @umehrot2 ,
   
   Please find me responses below.
   
   Are you able to do a simple aws s3 ls and list or get anything from your 
cluster on S3 ?
   _Yes,I am able to._ 
   Are you configuring to use S3A instead of EmrFS as the filesystem or your 
EMR cluster ?
   _No, I am not configuring any file system explicitly_
   Are you running and EMR bootstrap actions that change sdk/http client 
versions on the cluster ?
   _No, I am not._
   Are you changing spark driver/executor classpaths ?
   _No, I am not changing the classpaths._
   Is this happening specifically for hudi tables ? Or for non-hudi tables as 
well ?
   _Only for Hudi tables. The other Delta IO tables are working fine._



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] harishchanderramesh edited a comment on issue #1936: Hudi Query Error

2020-08-10 Thread GitBox


harishchanderramesh edited a comment on issue #1936:
URL: https://github.com/apache/hudi/issues/1936#issuecomment-671719305


   Hi @umehrot2 ,
   
   Please find me responses below.
   
   Are you able to do a simple aws s3 ls and list or get anything from your 
cluster on S3 ?
   - Yes,I am able to. 
   Are you configuring to use S3A instead of EmrFS as the filesystem or your 
EMR cluster ?
- No, I am not configuring any file system explicitly
   Are you running and EMR bootstrap actions that change sdk/http client 
versions on the cluster ?
- No, I am not.
   Are you changing spark driver/executor classpaths ?
   - No, I am not changing the classpaths.
   Is this happening specifically for hudi tables ? Or for non-hudi tables as 
well ?
   - Only for Hudi tables. The other Delta IO tables are working fine.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] harishchanderramesh commented on issue #1936: Hudi Query Error

2020-08-10 Thread GitBox


harishchanderramesh commented on issue #1936:
URL: https://github.com/apache/hudi/issues/1936#issuecomment-671719305


   Hi @umehrot2 ,
   
   Are you able to do a simple aws s3 ls and list or get anything from your 
cluster on S3 ?
- Yes,I am able to. 
   Are you configuring to use S3A instead of EmrFS as the filesystem or your 
EMR cluster ?
- No, I am not configuring any file system explicitly
   Are you running and EMR bootstrap actions that change sdk/http client 
versions on the cluster ?
- No, I am not.
   Are you changing spark driver/executor classpaths ?
   - No, I am not changing the classpaths.
   Is this happening specifically for hudi tables ? Or for non-hudi tables as 
well ?
   - Only for Hudi tables. The other Delta IO tables are working fine.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated: [HUDI-1175] Commenting out testsuite tests from Integration tests until we investigate the CI flakiness (#1945)

2020-08-10 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 9c24151  [HUDI-1175] Commenting out testsuite tests from Integration 
tests until we investigate the CI flakiness (#1945)
9c24151 is described below

commit 9c24151929659f80f88607f068662fdc5855fc91
Author: Sivabalan Narayanan 
AuthorDate: Tue Aug 11 00:00:57 2020 -0400

[HUDI-1175] Commenting out testsuite tests from Integration tests until we 
investigate the CI flakiness (#1945)
---
 .../{compaction.commands => compaction-bootstrap.commands}   |  4 
 docker/demo/compaction.commands  |  4 
 .../src/test/java/org/apache/hudi/integ/ITTestBase.java  | 12 +---
 .../test/java/org/apache/hudi/integ/ITTestHoodieDemo.java|  2 ++
 pom.xml  |  1 -
 5 files changed, 7 insertions(+), 16 deletions(-)

diff --git a/docker/demo/compaction.commands 
b/docker/demo/compaction-bootstrap.commands
similarity index 80%
copy from docker/demo/compaction.commands
copy to docker/demo/compaction-bootstrap.commands
index 6abdad7..6c246be 100644
--- a/docker/demo/compaction.commands
+++ b/docker/demo/compaction-bootstrap.commands
@@ -15,10 +15,6 @@
 #  See the License for the specific language governing permissions and
 # limitations under the License.
 
-connect --path /user/hive/warehouse/stock_ticks_mor
-compactions show all
-compaction schedule --hoodieConfigs hoodie.compact.inline.max.delta.commits=1
-compaction run --parallelism 2 --sparkMemory 1G  --schemaFilePath 
/var/demo/config/schema.avsc --retry 1 
 connect --path /user/hive/warehouse/stock_ticks_mor_bs
 compactions show all
 compaction schedule --hoodieConfigs hoodie.compact.inline.max.delta.commits=1
diff --git a/docker/demo/compaction.commands b/docker/demo/compaction.commands
index 6abdad7..a8baaff 100644
--- a/docker/demo/compaction.commands
+++ b/docker/demo/compaction.commands
@@ -19,7 +19,3 @@ connect --path /user/hive/warehouse/stock_ticks_mor
 compactions show all
 compaction schedule --hoodieConfigs hoodie.compact.inline.max.delta.commits=1
 compaction run --parallelism 2 --sparkMemory 1G  --schemaFilePath 
/var/demo/config/schema.avsc --retry 1 
-connect --path /user/hive/warehouse/stock_ticks_mor_bs
-compactions show all
-compaction schedule --hoodieConfigs hoodie.compact.inline.max.delta.commits=1
-compaction run --parallelism 2 --sparkMemory 1G  --schemaFilePath 
/var/demo/config/schema.avsc --retry 1 
diff --git 
a/hudi-integ-test/src/test/java/org/apache/hudi/integ/ITTestBase.java 
b/hudi-integ-test/src/test/java/org/apache/hudi/integ/ITTestBase.java
index 0423103..d0b32ee 100644
--- a/hudi-integ-test/src/test/java/org/apache/hudi/integ/ITTestBase.java
+++ b/hudi-integ-test/src/test/java/org/apache/hudi/integ/ITTestBase.java
@@ -115,11 +115,9 @@ public abstract class ITTestBase {
   }
 
   static String getPrestoConsoleCommand(String commandFile) {
-StringBuilder builder = new StringBuilder().append("presto --server " + 
PRESTO_COORDINATOR_URL)
+return new StringBuilder().append("presto --server " + 
PRESTO_COORDINATOR_URL)
 .append(" --catalog hive --schema default")
-.append(" -f " + commandFile);
-System.out.println("Presto comamnd " + builder.toString());
-return builder.toString();
+.append(" -f " + commandFile).toString();
   }
 
   @BeforeEach
@@ -166,14 +164,14 @@ public abstract class ITTestBase {
 
 boolean completed =
   
dockerClient.execStartCmd(createCmdResponse.getId()).withDetach(false).withTty(false).exec(callback)
-.awaitCompletion(900, SECONDS);
+.awaitCompletion(540, SECONDS);
 if (!completed) {
   callback.getStderr().flush();
   callback.getStdout().flush();
   LOG.error("\n\n ## Timed Out Command : " +  Arrays.asList(command));
   LOG.error("\n\n ## Stderr of timed-out command ###\n" + 
callback.getStderr().toString());
-  LOG.error("\n\n ## stdout of timed-out command ###\n" + 
callback.getStderr().toString());
-  throw new TimeoutException("Command " + command +  " has been running 
for more than 15 minutes. "
+  LOG.error("\n\n ## stdout of timed-out command ###\n" + 
callback.getStdout().toString());
+  throw new TimeoutException("Command " + command +  " has been running 
for more than 9 minutes. "
 + "Killing and failing !!");
 }
 int exitCode = 
dockerClient.inspectExecCmd(createCmdResponse.getId()).exec().getExitCode();
diff --git 
a/hudi-integ-test/src/test/java/org/apache/hudi/integ/ITTestHoodieDemo.java 
b/hudi-integ-test/src/test/java/org/apache/hudi/integ/ITTestHoodieDemo.java
index d2a0841..eb608df 100644
--- a/hudi-integ-test/src/test/java/org/apache/hudi/integ/ITTestHoodieDemo.java
+++ b/hudi-integ-test/src/test/java/org/apa

[GitHub] [hudi] vinothchandar merged pull request #1945: [HUDI-1175] Minor fixes for CI flakiness

2020-08-10 Thread GitBox


vinothchandar merged pull request #1945:
URL: https://github.com/apache/hudi/pull/1945


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on a change in pull request #1834: [HUDI-1013] Adding Bulk Insert V2 implementation

2020-08-10 Thread GitBox


vinothchandar commented on a change in pull request #1834:
URL: https://github.com/apache/hudi/pull/1834#discussion_r468311412



##
File path: hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##
@@ -108,262 +106,280 @@ private[hudi] object HoodieSparkSqlWriter {
   throw new HoodieException(s"hoodie table with name 
$existingTableName already exist at $basePath")
 }
   }
-  val (writeStatuses, writeClient: 
HoodieWriteClient[HoodieRecordPayload[Nothing]]) =
-if (!operation.equalsIgnoreCase(DELETE_OPERATION_OPT_VAL)) {
-  // register classes & schemas
-  val structName = s"${tblName}_record"
-  val nameSpace = s"hoodie.${tblName}"
-  sparkContext.getConf.registerKryoClasses(
-Array(classOf[org.apache.avro.generic.GenericData],
-  classOf[org.apache.avro.Schema]))
-  val schema = 
AvroConversionUtils.convertStructTypeToAvroSchema(df.schema, structName, 
nameSpace)
-  sparkContext.getConf.registerAvroSchemas(schema)
-  log.info(s"Registered avro schema : ${schema.toString(true)}")
-
-  // Convert to RDD[HoodieRecord]
-  val keyGenerator = 
DataSourceUtils.createKeyGenerator(toProperties(parameters))
-  val genericRecords: RDD[GenericRecord] = 
AvroConversionUtils.createRdd(df, structName, nameSpace)
-  val hoodieAllIncomingRecords = genericRecords.map(gr => {
-val orderingVal = HoodieAvroUtils.getNestedFieldVal(gr, 
parameters(PRECOMBINE_FIELD_OPT_KEY), false)
-  .asInstanceOf[Comparable[_]]
-DataSourceUtils.createHoodieRecord(gr,
-  orderingVal, keyGenerator.getKey(gr),
-  parameters(PAYLOAD_CLASS_OPT_KEY))
-  }).toJavaRDD()
-
-  // Handle various save modes
-  if (mode == SaveMode.ErrorIfExists && exists) {
-throw new HoodieException(s"hoodie table at $basePath already 
exists.")
-  }
 
-  if (mode == SaveMode.Overwrite && exists) {
-log.warn(s"hoodie table at $basePath already exists. Deleting 
existing data & overwriting with new data.")
-fs.delete(basePath, true)
-exists = false
-  }
+  val (writeSuccessfulRetVal: Boolean, commitTimeRetVal: 
common.util.Option[String], compactionInstantRetVal: common.util.Option[String],
+  writeClientRetVal: HoodieWriteClient[HoodieRecordPayload[Nothing]], 
tableConfigRetVal: HoodieTableConfig) =
+ if 
(operation.equalsIgnoreCase(BULK_INSERT_DATASET_OPERATION_OPT_VAL)) {
+// register classes & schemas
+val structName = s"${tblName}_record"
+val nameSpace = s"hoodie.${tblName}"
 
-  // Create the table if not present
-  if (!exists) {
-//FIXME(bootstrap): bootstrapIndexClass needs to be set when 
bootstrap index class is integrated.
-val tableMetaClient = 
HoodieTableMetaClient.initTableTypeWithBootstrap(sparkContext.hadoopConfiguration,
-  path.get, HoodieTableType.valueOf(tableType),
-  tblName, "archived", parameters(PAYLOAD_CLASS_OPT_KEY), null, 
null, null)
-tableConfig = tableMetaClient.getTableConfig
-  }
+// Handle various save modes
+if (mode == SaveMode.ErrorIfExists && exists) {
+  throw new HoodieException(s"hoodie table at $basePath already 
exists.")
+}
 
-  // Create a HoodieWriteClient & issue the write.
-  val client = 
hoodieWriteClient.getOrElse(DataSourceUtils.createHoodieClient(jsc, 
schema.toString, path.get,
-tblName, mapAsJavaMap(parameters)
-  )).asInstanceOf[HoodieWriteClient[HoodieRecordPayload[Nothing]]]
+val (success, commitTime: common.util.Option[String]) =
+  if (mode == SaveMode.Ignore && exists) {
+log.warn(s"hoodie table at $basePath already exists. Ignoring & 
not performing actual writes.")
+(false, common.util.Option.ofNullable(instantTime))
+  } else {
+if (mode == SaveMode.Overwrite && exists) {
+  log.warn(s"hoodie table at $basePath already exists. Deleting 
existing data & overwriting with new data.")
+  fs.delete(basePath, true)
+  exists = false
+}
 
-  if (asyncCompactionTriggerFn.isDefined &&
-isAsyncCompactionEnabled(client, tableConfig, parameters, 
jsc.hadoopConfiguration())) {
-asyncCompactionTriggerFn.get.apply(client)
-  }
+// Create the table if not present
+if (!exists) {
+  //FIXME(bootstrap): bootstrapIndexClass needs to be set when 
bootstrap index class is integrated.
+  val tableMetaClient = 
HoodieTableMetaClient.initTableTypeWithBootstrap(sparkContext.hadoopConfiguration,
+path.get, HoodieTableType.valueOf(tableType),
+tblName, "archived", parameters(PAYLOAD_CLASS_O

[GitHub] [hudi] vinothchandar commented on a change in pull request #1834: [HUDI-1013] Adding Bulk Insert V2 implementation

2020-08-10 Thread GitBox


vinothchandar commented on a change in pull request #1834:
URL: https://github.com/apache/hudi/pull/1834#discussion_r468311076



##
File path: hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##
@@ -108,262 +106,280 @@ private[hudi] object HoodieSparkSqlWriter {
   throw new HoodieException(s"hoodie table with name 
$existingTableName already exist at $basePath")
 }
   }
-  val (writeStatuses, writeClient: 
HoodieWriteClient[HoodieRecordPayload[Nothing]]) =
-if (!operation.equalsIgnoreCase(DELETE_OPERATION_OPT_VAL)) {
-  // register classes & schemas
-  val structName = s"${tblName}_record"
-  val nameSpace = s"hoodie.${tblName}"
-  sparkContext.getConf.registerKryoClasses(
-Array(classOf[org.apache.avro.generic.GenericData],
-  classOf[org.apache.avro.Schema]))
-  val schema = 
AvroConversionUtils.convertStructTypeToAvroSchema(df.schema, structName, 
nameSpace)
-  sparkContext.getConf.registerAvroSchemas(schema)
-  log.info(s"Registered avro schema : ${schema.toString(true)}")
-
-  // Convert to RDD[HoodieRecord]
-  val keyGenerator = 
DataSourceUtils.createKeyGenerator(toProperties(parameters))
-  val genericRecords: RDD[GenericRecord] = 
AvroConversionUtils.createRdd(df, structName, nameSpace)
-  val hoodieAllIncomingRecords = genericRecords.map(gr => {
-val orderingVal = HoodieAvroUtils.getNestedFieldVal(gr, 
parameters(PRECOMBINE_FIELD_OPT_KEY), false)
-  .asInstanceOf[Comparable[_]]
-DataSourceUtils.createHoodieRecord(gr,
-  orderingVal, keyGenerator.getKey(gr),
-  parameters(PAYLOAD_CLASS_OPT_KEY))
-  }).toJavaRDD()
-
-  // Handle various save modes
-  if (mode == SaveMode.ErrorIfExists && exists) {
-throw new HoodieException(s"hoodie table at $basePath already 
exists.")
-  }
 
-  if (mode == SaveMode.Overwrite && exists) {
-log.warn(s"hoodie table at $basePath already exists. Deleting 
existing data & overwriting with new data.")
-fs.delete(basePath, true)
-exists = false
-  }
+  val (writeSuccessfulRetVal: Boolean, commitTimeRetVal: 
common.util.Option[String], compactionInstantRetVal: common.util.Option[String],
+  writeClientRetVal: HoodieWriteClient[HoodieRecordPayload[Nothing]], 
tableConfigRetVal: HoodieTableConfig) =
+ if 
(operation.equalsIgnoreCase(BULK_INSERT_DATASET_OPERATION_OPT_VAL)) {
+// register classes & schemas
+val structName = s"${tblName}_record"
+val nameSpace = s"hoodie.${tblName}"
 
-  // Create the table if not present
-  if (!exists) {
-//FIXME(bootstrap): bootstrapIndexClass needs to be set when 
bootstrap index class is integrated.
-val tableMetaClient = 
HoodieTableMetaClient.initTableTypeWithBootstrap(sparkContext.hadoopConfiguration,
-  path.get, HoodieTableType.valueOf(tableType),
-  tblName, "archived", parameters(PAYLOAD_CLASS_OPT_KEY), null, 
null, null)
-tableConfig = tableMetaClient.getTableConfig
-  }
+// Handle various save modes
+if (mode == SaveMode.ErrorIfExists && exists) {
+  throw new HoodieException(s"hoodie table at $basePath already 
exists.")
+}
 
-  // Create a HoodieWriteClient & issue the write.
-  val client = 
hoodieWriteClient.getOrElse(DataSourceUtils.createHoodieClient(jsc, 
schema.toString, path.get,
-tblName, mapAsJavaMap(parameters)
-  )).asInstanceOf[HoodieWriteClient[HoodieRecordPayload[Nothing]]]
+val (success, commitTime: common.util.Option[String]) =
+  if (mode == SaveMode.Ignore && exists) {
+log.warn(s"hoodie table at $basePath already exists. Ignoring & 
not performing actual writes.")
+(false, common.util.Option.ofNullable(instantTime))
+  } else {
+if (mode == SaveMode.Overwrite && exists) {
+  log.warn(s"hoodie table at $basePath already exists. Deleting 
existing data & overwriting with new data.")
+  fs.delete(basePath, true)
+  exists = false
+}
 
-  if (asyncCompactionTriggerFn.isDefined &&
-isAsyncCompactionEnabled(client, tableConfig, parameters, 
jsc.hadoopConfiguration())) {
-asyncCompactionTriggerFn.get.apply(client)
-  }
+// Create the table if not present
+if (!exists) {
+  //FIXME(bootstrap): bootstrapIndexClass needs to be set when 
bootstrap index class is integrated.
+  val tableMetaClient = 
HoodieTableMetaClient.initTableTypeWithBootstrap(sparkContext.hadoopConfiguration,
+path.get, HoodieTableType.valueOf(tableType),
+tblName, "archived", parameters(PAYLOAD_CLASS_O

Build failed in Jenkins: hudi-snapshot-deployment-0.5 #366

2020-08-10 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 2.58 KB...]
cdi-api-1.0.jar
cdi-api.license
commons-cli-1.4.jar
commons-cli.license
commons-io-2.5.jar
commons-io.license
commons-lang3-3.5.jar
commons-lang3.license
ext
guava-20.0.jar
guice-4.2.0-no_aop.jar
jansi-1.17.1.jar
jansi-native
javax.inject-1.jar
jcl-over-slf4j-1.7.25.jar
jcl-over-slf4j.license
jsr250-api-1.0.jar
jsr250-api.license
maven-artifact-3.5.4.jar
maven-artifact.license
maven-builder-support-3.5.4.jar
maven-builder-support.license
maven-compat-3.5.4.jar
maven-compat.license
maven-core-3.5.4.jar
maven-core.license
maven-embedder-3.5.4.jar
maven-embedder.license
maven-model-3.5.4.jar
maven-model-builder-3.5.4.jar
maven-model-builder.license
maven-model.license
maven-plugin-api-3.5.4.jar
maven-plugin-api.license
maven-repository-metadata-3.5.4.jar
maven-repository-metadata.license
maven-resolver-api-1.1.1.jar
maven-resolver-api.license
maven-resolver-connector-basic-1.1.1.jar
maven-resolver-connector-basic.license
maven-resolver-impl-1.1.1.jar
maven-resolver-impl.license
maven-resolver-provider-3.5.4.jar
maven-resolver-provider.license
maven-resolver-spi-1.1.1.jar
maven-resolver-spi.license
maven-resolver-transport-wagon-1.1.1.jar
maven-resolver-transport-wagon.license
maven-resolver-util-1.1.1.jar
maven-resolver-util.license
maven-settings-3.5.4.jar
maven-settings-builder-3.5.4.jar
maven-settings-builder.license
maven-settings.license
maven-shared-utils-3.2.1.jar
maven-shared-utils.license
maven-slf4j-provider-3.5.4.jar
maven-slf4j-provider.license
org.eclipse.sisu.inject-0.3.3.jar
org.eclipse.sisu.inject.license
org.eclipse.sisu.plexus-0.3.3.jar
org.eclipse.sisu.plexus.license
plexus-cipher-1.7.jar
plexus-cipher.license
plexus-component-annotations-1.7.1.jar
plexus-component-annotations.license
plexus-interpolation-1.24.jar
plexus-interpolation.license
plexus-sec-dispatcher-1.4.jar
plexus-sec-dispatcher.license
plexus-utils-3.1.0.jar
plexus-utils.license
slf4j-api-1.7.25.jar
slf4j-api.license
wagon-file-3.1.0.jar
wagon-file.license
wagon-http-3.1.0-shaded.jar
wagon-http.license
wagon-provider-api-3.1.0.jar
wagon-provider-api.license

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/ext:
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native:
freebsd32
freebsd64
linux32
linux64
osx
README.txt
windows32
windows64

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/osx:
libjansi.jnilib

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows32:
jansi.dll

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows64:
jansi.dll
Finished /home/jenkins/tools/maven/apache-maven-3.5.4 Directory Listing :
Detected current version as: 
'HUDI_home=
0.6.0-SNAPSHOT'
[INFO] Scanning for projects...
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-spark_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-utilities_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-utilities_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark-bundle_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-spark-bundle_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-utilities-bundle_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-utilities-bundle_${scala.binary.version}:[unknown-version],
 

 line 27, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effec

[GitHub] [hudi] vinothchandar commented on a change in pull request #1834: [HUDI-1013] Adding Bulk Insert V2 implementation

2020-08-10 Thread GitBox


vinothchandar commented on a change in pull request #1834:
URL: https://github.com/apache/hudi/pull/1834#discussion_r461939866



##
File path: 
hudi-client/src/main/java/org/apache/hudi/client/HoodieInternalWriteStatus.java
##
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.client;
+
+import org.apache.hudi.common.model.HoodieWriteStat;
+import org.apache.hudi.common.util.collection.Pair;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Random;
+
+/**
+ * Hoodie's internal write status used in datasource implementation of bulk 
insert.
+ */
+public class HoodieInternalWriteStatus implements Serializable {

Review comment:
   so, this needs to be a separate class, because?

##
File path: 
hudi-spark/src/main/java/org/apache/hudi/keygen/GlobalDeleteKeyGenerator.java
##
@@ -54,12 +51,17 @@ public String getPartitionPath(GenericRecord record) {
   }
 
   @Override
-  public List getRecordKeyFields() {
-return recordKeyFields;
+  public List getPartitionPathFields() {
+return new ArrayList<>();
   }
 
   @Override
-  public List getPartitionPathFields() {
-return new ArrayList<>();
+  public String getRecordKeyFromRow(Row row) {
+return RowKeyGeneratorHelper.getRecordKeyFromRow(row, 
getRecordKeyFields(), getRecordKeyPositions(), true);
+  }
+
+  @Override
+  public String getPartitionPathFromRow(Row row) {

Review comment:
   one issue we need to think about is how we abstract the key generators 
out, so that even flink etc can use tthis? ideally we need to templatize 
`GenericRecord`, `Row`. this needs more thought. potentially beyond the scope 
of this PR 

##
File path: hudi-spark/src/main/scala/org/apache/hudi/DataSourceOptions.scala
##
@@ -129,6 +129,7 @@ object DataSourceWriteOptions {
   val INSERT_OPERATION_OPT_VAL = "insert"
   val UPSERT_OPERATION_OPT_VAL = "upsert"
   val DELETE_OPERATION_OPT_VAL = "delete"
+  val BULK_INSERT_DATASET_OPERATION_OPT_VAL = "bulk_insert_dataset"

Review comment:
   we need not overload the operation type here. we can just introduce a 
boolean option separately. 
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-1176) Support log4j2 config

2020-08-10 Thread hong dongdong (Jira)
hong dongdong created HUDI-1176:
---

 Summary: Support log4j2 config
 Key: HUDI-1176
 URL: https://issues.apache.org/jira/browse/HUDI-1176
 Project: Apache Hudi
  Issue Type: Bug
  Components: Testing
Reporter: hong dongdong
Assignee: hong dongdong


Now in some modules(like cli, client) use log4j2, and it cannot correct load 
config file (ERROR StatusLogger No log4j2 configuration file found. Using 
default configuration: logging only errors to the console.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] bvaradar commented on issue #1939: [SUPPORT] Hudi creating parquet with huge size and not in sink with limitFileSize

2020-08-10 Thread GitBox


bvaradar commented on issue #1939:
URL: https://github.com/apache/hudi/issues/1939#issuecomment-671690639


   To understand, Are you using bulk insert for initial loading and upsert for 
subsequent operations ? 
   For records with LOBs, it is important to tune 
hoodie.copyonwrite.record.size.estimate during initial bootstrap to get the 
file sizing right.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] bvaradar commented on issue #1813: ERROR HoodieDeltaStreamer: Got error running delta sync once.

2020-08-10 Thread GitBox


bvaradar commented on issue #1813:
URL: https://github.com/apache/hudi/issues/1813#issuecomment-671687932


   @tooptoop4 : The checkpoints are stored as part of .commit files in .hoodie 
folder and will persist across cluster, application restarts. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] bvaradar commented on issue #1925: [SUPPORT] Support for Confluent Cloud SchemaRegistryProvider

2020-08-10 Thread GitBox


bvaradar commented on issue #1925:
URL: https://github.com/apache/hudi/issues/1925#issuecomment-671686601


   @jpugliesi : With Spark DataSource write the schema is implicitly derived 
from the input data-frame we want to write. Is there a specific use-case you 
have in mind ?
   
   Since DeltaStreamer is a generic ingestion tool, it made sense to provide a 
framework to plugin schema providers.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on pull request #1871: [HUDI-781] Introduce HoodieTestTable for test preparation

2020-08-10 Thread GitBox


vinothchandar commented on pull request #1871:
URL: https://github.com/apache/hudi/pull/1871#issuecomment-671685350


   @yanghua @xushiyan can we please hold off on these refactoring PRs until we 
cut RCs please. we are trying to land the last bits. the rebase efforts from 
these, keep stretching the timelines. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch asf-site updated: Travis CI build asf-site

2020-08-10 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 0b938d5  Travis CI build asf-site
0b938d5 is described below

commit 0b938d5ab64a004a6da6f93a1f2a828fa6a76a3c
Author: CI 
AuthorDate: Tue Aug 11 02:02:22 2020 +

Travis CI build asf-site
---
 content/cn/docs/configurations.html | 28 
 content/docs/configurations.html| 28 
 2 files changed, 56 insertions(+)

diff --git a/content/cn/docs/configurations.html 
b/content/cn/docs/configurations.html
index 589e8d4..5636b76 100644
--- a/content/cn/docs/configurations.html
+++ b/content/cn/docs/configurations.html
@@ -368,6 +368,7 @@
   压缩配置
   指标配置
   内存配置
+  写提交回调配置
 
   
 
@@ -909,6 +910,33 @@ Hudi提供了一个选项,可以通过将对该分区中的插入作为对现
 属性:hoodie.memory.writestatus.failure.fraction 
 此属性控制报告给驱动程序的失败记录和异常的比例
 
+写提交回调配置
+控制写提交的回调。 如果用户启用了回调并且回调过程发生了错误,则会抛出异常。 当前只支持Http回调方式,Kafka不久后会支持。
+withCallbackConfig 
(HoodieWriteCommitCallbackConfig) 
+写提交回调相关配置
+
+writeCommitCallbackOn(callbackOn = false)
+Property: hoodie.write.commit.callback.on 
+打开或关闭回调功能. 默认关闭.
+
+withCallbackClass(callbackClass)
+Property: hoodie.write.commit.callback.class 
+用户自定义回调的类全路径名,回调类必须为HoodieWriteCommitCallback的子类。默认 
org.apache.hudi.callback.impl.HoodieWriteCommitHttpCallback
+
+HoodieWriteCommitHttpCallback
+
+withCallbackHttpUrl(url)
+Property: hoodie.write.commit.callback.http.url 
+Http回调主机,回调信息将会发送到该主机
+
+withCallbackHttpTimeoutSeconds(timeoutSeconds
 = 3)
+Property: hoodie.write.commit.callback.http.timeout.seconds
 
+Http回调超时时间(单位秒),默认3秒
+
+withCallbackHttpApiKey(apiKey)
+Property: hoodie.write.commit.callback.http.api.key 
+Http 回调秘钥. 默认 
hudi_write_commit_http_callback
+
   
 
   Back to top 
↑
diff --git a/content/docs/configurations.html b/content/docs/configurations.html
index 69aab3f..69e45cb 100644
--- a/content/docs/configurations.html
+++ b/content/docs/configurations.html
@@ -379,6 +379,7 @@
   Compaction configs
   Metrics configs
   Memory configs
+  Write commit callback 
configs
 
   
 
@@ -884,6 +885,33 @@ HoodieWriteConfig can be built using a builder pattern as 
below.
 Property: hoodie.memory.writestatus.failure.fraction 
 This property controls what fraction of the failed 
record, exceptions we report back to driver
 
+Write commit callback configs
+Controls callback behavior on write commit. Exception will be thrown if 
user enabled the callback service and errors occurred during the process of 
callback. Currently support http callback only, kafka implementation will be 
supported in the near future. 
+withCallbackConfig 
(HoodieWriteCommitCallbackConfig) 
+Callback related configs
+
+writeCommitCallbackOn(callbackOn = false)
+Property: hoodie.write.commit.callback.on 
+Turn callback on/off. off by default.
+
+withCallbackClass(callbackClass)
+Property: hoodie.write.commit.callback.class 
+Full path of user-defined callback class and must be 
a subclass of HoodieWriteCommitCallback class, 
org.apache.hudi.callback.impl.HoodieWriteCommitHttpCallback by 
default
+
+HoodieWriteCommitHttpCallback
+
+withCallbackHttpUrl(url)
+Property: hoodie.write.commit.callback.http.url 
+Callback host to be sent along with callback 
messages
+
+withCallbackHttpTimeoutSeconds(timeoutSeconds
 = 3)
+Property: hoodie.write.commit.callback.http.timeout.seconds
 
+Callback timeout in seconds. 3 by default
+
+withCallbackHttpApiKey(apiKey)
+Property: hoodie.write.commit.callback.http.api.key 
+Http callback API key. 
hudi_write_commit_http_callback by default
+
   
 
   Back to top 
↑



[hudi] branch asf-site updated: Travis CI build asf-site

2020-08-10 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 0b938d5  Travis CI build asf-site
0b938d5 is described below

commit 0b938d5ab64a004a6da6f93a1f2a828fa6a76a3c
Author: CI 
AuthorDate: Tue Aug 11 02:02:22 2020 +

Travis CI build asf-site
---
 content/cn/docs/configurations.html | 28 
 content/docs/configurations.html| 28 
 2 files changed, 56 insertions(+)

diff --git a/content/cn/docs/configurations.html 
b/content/cn/docs/configurations.html
index 589e8d4..5636b76 100644
--- a/content/cn/docs/configurations.html
+++ b/content/cn/docs/configurations.html
@@ -368,6 +368,7 @@
   压缩配置
   指标配置
   内存配置
+  写提交回调配置
 
   
 
@@ -909,6 +910,33 @@ Hudi提供了一个选项,可以通过将对该分区中的插入作为对现
 属性:hoodie.memory.writestatus.failure.fraction 
 此属性控制报告给驱动程序的失败记录和异常的比例
 
+写提交回调配置
+控制写提交的回调。 如果用户启用了回调并且回调过程发生了错误,则会抛出异常。 当前只支持Http回调方式,Kafka不久后会支持。
+withCallbackConfig 
(HoodieWriteCommitCallbackConfig) 
+写提交回调相关配置
+
+writeCommitCallbackOn(callbackOn = false)
+Property: hoodie.write.commit.callback.on 
+打开或关闭回调功能. 默认关闭.
+
+withCallbackClass(callbackClass)
+Property: hoodie.write.commit.callback.class 
+用户自定义回调的类全路径名,回调类必须为HoodieWriteCommitCallback的子类。默认 
org.apache.hudi.callback.impl.HoodieWriteCommitHttpCallback
+
+HoodieWriteCommitHttpCallback
+
+withCallbackHttpUrl(url)
+Property: hoodie.write.commit.callback.http.url 
+Http回调主机,回调信息将会发送到该主机
+
+withCallbackHttpTimeoutSeconds(timeoutSeconds
 = 3)
+Property: hoodie.write.commit.callback.http.timeout.seconds
 
+Http回调超时时间(单位秒),默认3秒
+
+withCallbackHttpApiKey(apiKey)
+Property: hoodie.write.commit.callback.http.api.key 
+Http 回调秘钥. 默认 
hudi_write_commit_http_callback
+
   
 
   Back to top 
↑
diff --git a/content/docs/configurations.html b/content/docs/configurations.html
index 69aab3f..69e45cb 100644
--- a/content/docs/configurations.html
+++ b/content/docs/configurations.html
@@ -379,6 +379,7 @@
   Compaction configs
   Metrics configs
   Memory configs
+  Write commit callback 
configs
 
   
 
@@ -884,6 +885,33 @@ HoodieWriteConfig can be built using a builder pattern as 
below.
 Property: hoodie.memory.writestatus.failure.fraction 
 This property controls what fraction of the failed 
record, exceptions we report back to driver
 
+Write commit callback configs
+Controls callback behavior on write commit. Exception will be thrown if 
user enabled the callback service and errors occurred during the process of 
callback. Currently support http callback only, kafka implementation will be 
supported in the near future. 
+withCallbackConfig 
(HoodieWriteCommitCallbackConfig) 
+Callback related configs
+
+writeCommitCallbackOn(callbackOn = false)
+Property: hoodie.write.commit.callback.on 
+Turn callback on/off. off by default.
+
+withCallbackClass(callbackClass)
+Property: hoodie.write.commit.callback.class 
+Full path of user-defined callback class and must be 
a subclass of HoodieWriteCommitCallback class, 
org.apache.hudi.callback.impl.HoodieWriteCommitHttpCallback by 
default
+
+HoodieWriteCommitHttpCallback
+
+withCallbackHttpUrl(url)
+Property: hoodie.write.commit.callback.http.url 
+Callback host to be sent along with callback 
messages
+
+withCallbackHttpTimeoutSeconds(timeoutSeconds
 = 3)
+Property: hoodie.write.commit.callback.http.timeout.seconds
 
+Callback timeout in seconds. 3 by default
+
+withCallbackHttpApiKey(apiKey)
+Property: hoodie.write.commit.callback.http.api.key 
+Http callback API key. 
hudi_write_commit_http_callback by default
+
   
 
   Back to top 
↑



[jira] [Closed] (HUDI-1121) Provide a document describing how to use callback

2020-08-10 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-1121.
--
Resolution: Done

Done via asf-site branch: a6f991c4c2d72166fe8c898b6f63bb1d16ccd7a0

> Provide a document describing how to use callback
> -
>
> Key: HUDI-1121
> URL: https://issues.apache.org/jira/browse/HUDI-1121
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: wangxianghu
>Assignee: wangxianghu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1121) Provide a document describing how to use callback

2020-08-10 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-1121:
---
Status: Open  (was: New)

> Provide a document describing how to use callback
> -
>
> Key: HUDI-1121
> URL: https://issues.apache.org/jira/browse/HUDI-1121
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: wangxianghu
>Assignee: wangxianghu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] yanghua merged pull request #1935: [HUDI-1121][DOC]Provide a document describing how to use callback

2020-08-10 Thread GitBox


yanghua merged pull request #1935:
URL: https://github.com/apache/hudi/pull/1935


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch asf-site updated: [HUDI-1121] Provide a document describing how to use callback (#1935)

2020-08-10 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new a6f991c  [HUDI-1121] Provide a document describing how to use callback 
(#1935)
a6f991c is described below

commit a6f991c4c2d72166fe8c898b6f63bb1d16ccd7a0
Author: Mathieu 
AuthorDate: Tue Aug 11 09:59:27 2020 +0800

[HUDI-1121] Provide a document describing how to use callback (#1935)
---
 docs/_docs/2_4_configurations.cn.md | 27 +++
 docs/_docs/2_4_configurations.md| 27 +++
 2 files changed, 54 insertions(+)

diff --git a/docs/_docs/2_4_configurations.cn.md 
b/docs/_docs/2_4_configurations.cn.md
index b577990..8578285 100644
--- a/docs/_docs/2_4_configurations.cn.md
+++ b/docs/_docs/2_4_configurations.cn.md
@@ -547,3 +547,30 @@ Hudi提供了一个选项,可以通过将对该分区中的插入作为对现
  withWriteStatusFailureFraction(failureFraction = 0.1) 
{#withWriteStatusFailureFraction}
 属性:`hoodie.memory.writestatus.failure.fraction` 
 此属性控制报告给驱动程序的失败记录和异常的比例
+
+### 写提交回调配置
+控制写提交的回调。 如果用户启用了回调并且回调过程发生了错误,则会抛出异常。 当前只支持Http回调方式,Kafka不久后会支持。
+[withCallbackConfig](#withCallbackConfig) (HoodieWriteCommitCallbackConfig) 

+写提交回调相关配置
+
+# writeCommitCallbackOn(callbackOn = false) {#writeCommitCallbackOn} 
+Property: `hoodie.write.commit.callback.on` 
+打开或关闭回调功能. 默认关闭.
+
+# withCallbackClass(callbackClass) {#withCallbackClass} 
+Property: `hoodie.write.commit.callback.class` 
+用户自定义回调的类全路径名,回调类必须为HoodieWriteCommitCallback的子类。默认 
org.apache.hudi.callback.impl.HoodieWriteCommitHttpCallback
+
+ HoodieWriteCommitHttpCallback
+
+# withCallbackHttpUrl(url) {#withCallbackHttpUrl} 
+Property: `hoodie.write.commit.callback.http.url` 
+Http回调主机,回调信息将会发送到该主机
+
+# withCallbackHttpTimeoutSeconds(timeoutSeconds = 3) 
{#withCallbackHttpTimeoutSeconds} 
+Property: `hoodie.write.commit.callback.http.timeout.seconds` 
+Http回调超时时间(单位秒),默认3秒
+
+# withCallbackHttpApiKey(apiKey) {#withCallbackHttpApiKey} 
+Property: `hoodie.write.commit.callback.http.api.key` 
+Http 回调秘钥. 默认 hudi_write_commit_http_callback
diff --git a/docs/_docs/2_4_configurations.md b/docs/_docs/2_4_configurations.md
index 627d148..47bd08c 100644
--- a/docs/_docs/2_4_configurations.md
+++ b/docs/_docs/2_4_configurations.md
@@ -510,3 +510,30 @@ Property: `hoodie.memory.compaction.fraction` 
  withWriteStatusFailureFraction(failureFraction = 0.1) 
{#withWriteStatusFailureFraction}
 Property: `hoodie.memory.writestatus.failure.fraction` 
 This property controls what fraction of the failed 
record, exceptions we report back to driver
+
+### Write commit callback configs
+Controls callback behavior on write commit. Exception will be thrown if user 
enabled the callback service and errors occurred during the process of 
callback. Currently support http callback only, kafka implementation will be 
supported in the near future. 
+[withCallbackConfig](#withCallbackConfig) (HoodieWriteCommitCallbackConfig) 

+Callback related configs
+
+# writeCommitCallbackOn(callbackOn = false) {#writeCommitCallbackOn} 
+Property: `hoodie.write.commit.callback.on` 
+Turn callback on/off. off by default.
+
+# withCallbackClass(callbackClass) {#withCallbackClass} 
+Property: `hoodie.write.commit.callback.class` 
+Full path of user-defined callback class and must be 
a subclass of HoodieWriteCommitCallback class, 
org.apache.hudi.callback.impl.HoodieWriteCommitHttpCallback by default
+
+ HoodieWriteCommitHttpCallback
+
+# withCallbackHttpUrl(url) {#withCallbackHttpUrl} 
+Property: `hoodie.write.commit.callback.http.url` 
+Callback host to be sent along with callback 
messages
+
+# withCallbackHttpTimeoutSeconds(timeoutSeconds = 3) 
{#withCallbackHttpTimeoutSeconds} 
+Property: `hoodie.write.commit.callback.http.timeout.seconds` 
+Callback timeout in seconds. 3 by default
+
+# withCallbackHttpApiKey(apiKey) {#withCallbackHttpApiKey} 
+Property: `hoodie.write.commit.callback.http.api.key` 
+Http callback API key. 
hudi_write_commit_http_callback by default



[GitHub] [hudi] xushiyan commented on a change in pull request #1871: [HUDI-781] Introduce HoodieTestTable for test preparation

2020-08-10 Thread GitBox


xushiyan commented on a change in pull request #1871:
URL: https://github.com/apache/hudi/pull/1871#discussion_r468279092



##
File path: 
hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestUtils.java
##
@@ -237,11 +229,12 @@ public static void 
createPendingCleanFiles(HoodieTableMetaClient metaClient, Str
 
   public static void createCorruptedPendingCleanFiles(HoodieTableMetaClient 
metaClient, String commitTime) {
 Arrays.asList(HoodieTimeline.makeRequestedCleanerFileName(commitTime),
-HoodieTimeline.makeInflightCleanerFileName(commitTime)).forEach(f -> {
+HoodieTimeline.makeInflightCleanerFileName(commitTime))
+.forEach(f -> {
   FSDataOutputStream os = null;
   try {
 Path commitFile = new Path(
-metaClient.getBasePath() + "/" + 
HoodieTableMetaClient.METAFOLDER_NAME + "/" + f);
+metaClient.getBasePath() + "/" + 
HoodieTableMetaClient.METAFOLDER_NAME + "/" + f);

Review comment:
   ok @yanghua ,  understood. I agree that the work should focus on the 
task. I didn't intend to change these; intellij did it for me. i should have 
disabled the auto-formatting. will keep this sort of diffs out from now on. 
thanks





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated: [HUDI-781] Introduce HoodieTestTable for test preparation (#1871)

2020-08-10 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new b2e703d  [HUDI-781] Introduce HoodieTestTable for test preparation 
(#1871)
b2e703d is described below

commit b2e703d4427abca02b053facd5058aa256ef
Author: Raymond Xu <2701446+xushi...@users.noreply.github.com>
AuthorDate: Mon Aug 10 18:44:03 2020 -0700

[HUDI-781] Introduce HoodieTestTable for test preparation (#1871)
---
 .../org/apache/hudi/io/HoodieAppendHandle.java |   1 +
 .../org/apache/hudi/io/HoodieCreateHandle.java |   1 +
 .../java/org/apache/hudi/io/HoodieMergeHandle.java |   1 +
 .../java/org/apache/hudi/io/HoodieWriteHandle.java |   3 +-
 .../java/org/apache/hudi/table/MarkerFiles.java|  15 +-
 .../rollback/MarkerBasedRollbackStrategy.java  |   8 +-
 .../table/upgrade/ZeroToOneUpgradeHandler.java |   2 +-
 .../TestHoodieClientOnCopyOnWriteStorage.java  |   2 +-
 .../java/org/apache/hudi/table/TestCleaner.java| 393 -
 .../apache/hudi/table/TestConsistencyGuard.java|  28 +-
 .../org/apache/hudi/table/TestMarkerFiles.java |  10 +-
 .../table/action/commit/TestUpsertPartitioner.java |   8 +-
 .../table/action/compact/TestHoodieCompactor.java  |   7 +-
 .../rollback/TestMarkerBasedRollbackStrategy.java  |  69 ++--
 .../hudi/testutils/HoodieClientTestUtils.java  |  99 ++
 .../java/org/apache/hudi/common/model}/IOType.java |  15 +-
 .../hudi/common/testutils/FileCreateUtils.java | 113 ++
 .../hudi/common/testutils/HoodieTestTable.java | 232 
 .../hudi/common/testutils/HoodieTestUtils.java | 102 +++---
 19 files changed, 642 insertions(+), 467 deletions(-)

diff --git 
a/hudi-client/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java 
b/hudi-client/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java
index 7a8e5ab..7996a77 100644
--- a/hudi-client/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java
+++ b/hudi-client/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java
@@ -32,6 +32,7 @@ import org.apache.hudi.common.model.HoodieRecordLocation;
 import org.apache.hudi.common.model.HoodieRecordPayload;
 import org.apache.hudi.common.model.HoodieWriteStat;
 import org.apache.hudi.common.model.HoodieWriteStat.RuntimeStats;
+import org.apache.hudi.common.model.IOType;
 import org.apache.hudi.common.table.log.HoodieLogFormat;
 import org.apache.hudi.common.table.log.HoodieLogFormat.Writer;
 import org.apache.hudi.common.table.log.block.HoodieDataBlock;
diff --git 
a/hudi-client/src/main/java/org/apache/hudi/io/HoodieCreateHandle.java 
b/hudi-client/src/main/java/org/apache/hudi/io/HoodieCreateHandle.java
index 705e98d..5a76dc7 100644
--- a/hudi-client/src/main/java/org/apache/hudi/io/HoodieCreateHandle.java
+++ b/hudi-client/src/main/java/org/apache/hudi/io/HoodieCreateHandle.java
@@ -28,6 +28,7 @@ import org.apache.hudi.common.model.HoodieRecordLocation;
 import org.apache.hudi.common.model.HoodieRecordPayload;
 import org.apache.hudi.common.model.HoodieWriteStat;
 import org.apache.hudi.common.model.HoodieWriteStat.RuntimeStats;
+import org.apache.hudi.common.model.IOType;
 import org.apache.hudi.common.util.Option;
 import org.apache.hudi.common.util.collection.Pair;
 import org.apache.hudi.config.HoodieWriteConfig;
diff --git 
a/hudi-client/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java 
b/hudi-client/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
index f0ea284..8d54065 100644
--- a/hudi-client/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
+++ b/hudi-client/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
@@ -29,6 +29,7 @@ import org.apache.hudi.common.model.HoodieRecordLocation;
 import org.apache.hudi.common.model.HoodieRecordPayload;
 import org.apache.hudi.common.model.HoodieWriteStat;
 import org.apache.hudi.common.model.HoodieWriteStat.RuntimeStats;
+import org.apache.hudi.common.model.IOType;
 import org.apache.hudi.common.util.DefaultSizeEstimator;
 import org.apache.hudi.common.util.HoodieRecordSizeEstimator;
 import org.apache.hudi.common.util.Option;
diff --git 
a/hudi-client/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java 
b/hudi-client/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java
index d148b1b..5ea8c38 100644
--- a/hudi-client/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java
+++ b/hudi-client/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java
@@ -24,6 +24,7 @@ import org.apache.hudi.client.WriteStatus;
 import org.apache.hudi.common.fs.FSUtils;
 import org.apache.hudi.common.model.HoodieRecord;
 import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.model.IOType;
 import org.apache.hudi.common.util.HoodieTimer;
 import org.apache.hudi.common.util.Option;
 import org.apache.hudi.common.util.ReflectionUtils;
@@ -33,13 +34,13 @@ impor

[GitHub] [hudi] yanghua merged pull request #1871: [HUDI-781] Introduce HoodieTestTable for test preparation

2020-08-10 Thread GitBox


yanghua merged pull request #1871:
URL: https://github.com/apache/hudi/pull/1871


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yanghua commented on a change in pull request #1871: [HUDI-781] Introduce HoodieTestTable for test preparation

2020-08-10 Thread GitBox


yanghua commented on a change in pull request #1871:
URL: https://github.com/apache/hudi/pull/1871#discussion_r468276194



##
File path: 
hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestUtils.java
##
@@ -237,11 +229,12 @@ public static void 
createPendingCleanFiles(HoodieTableMetaClient metaClient, Str
 
   public static void createCorruptedPendingCleanFiles(HoodieTableMetaClient 
metaClient, String commitTime) {
 Arrays.asList(HoodieTimeline.makeRequestedCleanerFileName(commitTime),
-HoodieTimeline.makeInflightCleanerFileName(commitTime)).forEach(f -> {
+HoodieTimeline.makeInflightCleanerFileName(commitTime))
+.forEach(f -> {
   FSDataOutputStream os = null;
   try {
 Path commitFile = new Path(
-metaClient.getBasePath() + "/" + 
HoodieTableMetaClient.METAFOLDER_NAME + "/" + f);
+metaClient.getBasePath() + "/" + 
HoodieTableMetaClient.METAFOLDER_NAME + "/" + f);

Review comment:
   Ok, I noticed this because I saw the indent has been changed. Actually, 
I suggested that we can focus on the core work about the PR want to do. Do not 
touch something else e.g. indent and code style. Of cause, I should not ask you 
to change this.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on a change in pull request #1945: [HUDI-1175] Minor fixes for CI flakiness

2020-08-10 Thread GitBox


nsivabalan commented on a change in pull request #1945:
URL: https://github.com/apache/hudi/pull/1945#discussion_r468269077



##
File path: hudi-integ-test/src/test/java/org/apache/hudi/integ/ITTestBase.java
##
@@ -103,22 +103,26 @@ private static String getHiveConsoleCommandFile(String 
commandFile, String addit
 if (additionalVar != null) {
   builder.append(" --hivevar " + additionalVar + " ");
 }
-return builder.append(" -f ").append(commandFile).toString();
+builder.append(" -f ").append(commandFile);
+System.out.println("Hive command : " + builder.toString());

Review comment:
   you are right. I do see all commands are already logged. my bad. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] leesf merged pull request #1942: [HUDI-1173] fix hudi-prometheus pom dependency

2020-08-10 Thread GitBox


leesf merged pull request #1942:
URL: https://github.com/apache/hudi/pull/1942


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated: [HUDI-1173] fix hudi-prometheus pom dependency (#1942)

2020-08-10 Thread leesf
This is an automated email from the ASF dual-hosted git repository.

leesf pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 934f00b  [HUDI-1173] fix hudi-prometheus pom dependency  (#1942)
934f00b is described below

commit 934f00b68976b68b9393406d83b9953811de90b4
Author: liujinhui <965147...@qq.com>
AuthorDate: Tue Aug 11 09:06:17 2020 +0800

[HUDI-1173] fix hudi-prometheus pom dependency  (#1942)
---
 packaging/hudi-spark-bundle/pom.xml | 5 +
 packaging/hudi-utilities-bundle/pom.xml | 5 +
 2 files changed, 10 insertions(+)

diff --git a/packaging/hudi-spark-bundle/pom.xml 
b/packaging/hudi-spark-bundle/pom.xml
index 0780751..145cf0f 100644
--- a/packaging/hudi-spark-bundle/pom.xml
+++ b/packaging/hudi-spark-bundle/pom.xml
@@ -89,6 +89,11 @@
   
com.twitter:bijection-core_${scala.binary.version}
   io.dropwizard.metrics:metrics-core
   io.dropwizard.metrics:metrics-graphite
+  io.prometheus:simpleclient
+  io.prometheus:simpleclient_httpserver
+  io.prometheus:simpleclient_dropwizard
+  io.prometheus:simpleclient_pushgateway
+  io.prometheus:simpleclient_common
   com.yammer.metrics:metrics-core
 
   
org.apache.spark:spark-avro_${scala.binary.version}
diff --git a/packaging/hudi-utilities-bundle/pom.xml 
b/packaging/hudi-utilities-bundle/pom.xml
index f985328..0a70e0e 100644
--- a/packaging/hudi-utilities-bundle/pom.xml
+++ b/packaging/hudi-utilities-bundle/pom.xml
@@ -96,6 +96,11 @@
   io.confluent:kafka-schema-registry-client
   io.dropwizard.metrics:metrics-core
   io.dropwizard.metrics:metrics-graphite
+  io.prometheus:simpleclient
+  io.prometheus:simpleclient_httpserver
+  io.prometheus:simpleclient_dropwizard
+  io.prometheus:simpleclient_pushgateway
+  io.prometheus:simpleclient_common
   com.yammer.metrics:metrics-core
   
org.apache.spark:spark-streaming-kafka-0-10_${scala.binary.version}
   
org.apache.kafka:kafka_${scala.binary.version}



[GitHub] [hudi] vinothchandar commented on a change in pull request #1945: [HUDI-1175] Minor fixes for CI flakiness

2020-08-10 Thread GitBox


vinothchandar commented on a change in pull request #1945:
URL: https://github.com/apache/hudi/pull/1945#discussion_r468266821



##
File path: hudi-integ-test/src/test/java/org/apache/hudi/integ/ITTestBase.java
##
@@ -103,22 +103,26 @@ private static String getHiveConsoleCommandFile(String 
commandFile, String addit
 if (additionalVar != null) {
   builder.append(" --hivevar " + additionalVar + " ");
 }
-return builder.append(" -f ").append(commandFile).toString();
+builder.append(" -f ").append(commandFile);
+System.out.println("Hive command : " + builder.toString());

Review comment:
   We should look into why these thing are not being logged now. they 
showed up for sure before





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on a change in pull request #1945: [HUDI-1175] Minor fixes for CI flakiness

2020-08-10 Thread GitBox


vinothchandar commented on a change in pull request #1945:
URL: https://github.com/apache/hudi/pull/1945#discussion_r468266606



##
File path: hudi-integ-test/src/test/java/org/apache/hudi/integ/ITTestBase.java
##
@@ -103,22 +103,26 @@ private static String getHiveConsoleCommandFile(String 
commandFile, String addit
 if (additionalVar != null) {
   builder.append(" --hivevar " + additionalVar + " ");
 }
-return builder.append(" -f ").append(commandFile).toString();
+builder.append(" -f ").append(commandFile);
+System.out.println("Hive command : " + builder.toString());

Review comment:
   no I meant, why not log.info() or something? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on a change in pull request #1945: [HUDI-1175] Minor fixes for CI flakiness

2020-08-10 Thread GitBox


nsivabalan commented on a change in pull request #1945:
URL: https://github.com/apache/hudi/pull/1945#discussion_r468265130



##
File path: hudi-integ-test/src/test/java/org/apache/hudi/integ/ITTestBase.java
##
@@ -103,22 +103,26 @@ private static String getHiveConsoleCommandFile(String 
commandFile, String addit
 if (additionalVar != null) {
   builder.append(" --hivevar " + additionalVar + " ");
 }
-return builder.append(" -f ").append(commandFile).toString();
+builder.append(" -f ").append(commandFile);
+System.out.println("Hive command : " + builder.toString());

Review comment:
   this was useful in general while checking the Integ tests log output. If 
not for this, it wasn't easy to understand where the tests fails if tests hung 
or terminated half way if not for explicit exception. I prefer to keep this 
actually. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on a change in pull request #1945: [HUDI-1175] Minor fixes for CI flakiness

2020-08-10 Thread GitBox


nsivabalan commented on a change in pull request #1945:
URL: https://github.com/apache/hudi/pull/1945#discussion_r468265130



##
File path: hudi-integ-test/src/test/java/org/apache/hudi/integ/ITTestBase.java
##
@@ -103,22 +103,26 @@ private static String getHiveConsoleCommandFile(String 
commandFile, String addit
 if (additionalVar != null) {
   builder.append(" --hivevar " + additionalVar + " ");
 }
-return builder.append(" -f ").append(commandFile).toString();
+builder.append(" -f ").append(commandFile);
+System.out.println("Hive command : " + builder.toString());

Review comment:
   this was useful in general while checking the Integ tests log output. If 
not for this, it wasn't easy to understand where the tests fails if tests hung 
or terminated half way if not for explicit exception. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1119) MOR appends slow due to file listing in executor side for finding the log file

2020-08-10 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1119:
-
Fix Version/s: (was: 0.6.0)

> MOR appends slow due to file listing in executor side for finding the log file
> --
>
> Key: HUDI-1119
> URL: https://issues.apache.org/jira/browse/HUDI-1119
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Writer Core
>Reporter: Balaji Varadarajan
>Priority: Major
>  Labels: perf
>
> Another place where we do listing in executor. 
> (Source : [https://github.com/apache/hudi/issues/1852])
> : 
> sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:352)
> shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.services.AbfsHttpOperation.processResponse(AbfsHttpOperation.java:259)
> shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.executeHttpOperation(AbfsRestOperation.java:167)
> shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.execute(AbfsRestOperation.java:124)
> shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.services.AbfsClient.listPath(AbfsClient.java:180)
> shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.listFiles(AzureBlobFileSystemStore.java:549)
> shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.listStatus(AzureBlobFileSystemStore.java:628)
> shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.listStatus(AzureBlobFileSystemStore.java:532)
> shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.listStatus(AzureBlobFileSystem.java:344)
> org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1517)
> org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1557)
> org.apache.hudi.common.fs.HoodieWrapperFileSystem.listStatus(HoodieWrapperFileSystem.java:487)
> org.apache.hudi.common.fs.FSUtils.getAllLogFiles(FSUtils.java:409)
> org.apache.hudi.common.fs.FSUtils.getLatestLogVersion(FSUtils.java:420)
> org.apache.hudi.common.fs.FSUtils.computeNextLogVersion(FSUtils.java:434)
> org.apache.hudi.common.model.HoodieLogFile.rollOver(HoodieLogFile.java:115)
> org.apache.hudi.common.table.log.HoodieLogFormatWriter.(HoodieLogFormatWriter.java:101)
> org.apache.hudi.common.table.log.HoodieLogFormat$WriterBuilder.build(HoodieLogFormat.java:249)
> org.apache.hudi.io.HoodieAppendHandle.createLogWriter(HoodieAppendHandle.java:291)
> org.apache.hudi.io.HoodieAppendHandle.init(HoodieAppendHandle.java:141)
> org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:197)
> org.apache.hudi.table.action.deltacommit.DeltaCommitActionExecutor.handleUpdate(DeltaCommitActionExecutor.java:77)
> org.apache.hudi.table.action.commit.BaseCommitActionExecutor.handleUpsertPartition(BaseCommitActionExecutor.java:246)
> org.apache.hudi.table.action.commit.BaseCommitActionExecutor.lambda$execute$caffe4c4$1(BaseCommitActionExecutor.java:102)
> org.apache.hudi.table.action.commit.BaseCommitActionExecutor$$Lambda$192/1449069739.call(Unknown
>  Source)
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:105)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1119) MOR appends slow due to file listing in executor side for finding the log file

2020-08-10 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1119:
-
Priority: Major  (was: Blocker)

> MOR appends slow due to file listing in executor side for finding the log file
> --
>
> Key: HUDI-1119
> URL: https://issues.apache.org/jira/browse/HUDI-1119
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Writer Core
>Reporter: Balaji Varadarajan
>Priority: Major
>  Labels: perf
> Fix For: 0.6.0
>
>
> Another place where we do listing in executor. 
> (Source : [https://github.com/apache/hudi/issues/1852])
> : 
> sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:352)
> shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.services.AbfsHttpOperation.processResponse(AbfsHttpOperation.java:259)
> shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.executeHttpOperation(AbfsRestOperation.java:167)
> shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.execute(AbfsRestOperation.java:124)
> shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.services.AbfsClient.listPath(AbfsClient.java:180)
> shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.listFiles(AzureBlobFileSystemStore.java:549)
> shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.listStatus(AzureBlobFileSystemStore.java:628)
> shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.listStatus(AzureBlobFileSystemStore.java:532)
> shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.listStatus(AzureBlobFileSystem.java:344)
> org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1517)
> org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1557)
> org.apache.hudi.common.fs.HoodieWrapperFileSystem.listStatus(HoodieWrapperFileSystem.java:487)
> org.apache.hudi.common.fs.FSUtils.getAllLogFiles(FSUtils.java:409)
> org.apache.hudi.common.fs.FSUtils.getLatestLogVersion(FSUtils.java:420)
> org.apache.hudi.common.fs.FSUtils.computeNextLogVersion(FSUtils.java:434)
> org.apache.hudi.common.model.HoodieLogFile.rollOver(HoodieLogFile.java:115)
> org.apache.hudi.common.table.log.HoodieLogFormatWriter.(HoodieLogFormatWriter.java:101)
> org.apache.hudi.common.table.log.HoodieLogFormat$WriterBuilder.build(HoodieLogFormat.java:249)
> org.apache.hudi.io.HoodieAppendHandle.createLogWriter(HoodieAppendHandle.java:291)
> org.apache.hudi.io.HoodieAppendHandle.init(HoodieAppendHandle.java:141)
> org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:197)
> org.apache.hudi.table.action.deltacommit.DeltaCommitActionExecutor.handleUpdate(DeltaCommitActionExecutor.java:77)
> org.apache.hudi.table.action.commit.BaseCommitActionExecutor.handleUpsertPartition(BaseCommitActionExecutor.java:246)
> org.apache.hudi.table.action.commit.BaseCommitActionExecutor.lambda$execute$caffe4c4$1(BaseCommitActionExecutor.java:102)
> org.apache.hudi.table.action.commit.BaseCommitActionExecutor$$Lambda$192/1449069739.call(Unknown
>  Source)
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:105)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-289) Implement a test suite to support long running test for Hudi writing and querying end-end

2020-08-10 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-289:

Fix Version/s: (was: 0.6.0)
 Priority: Major  (was: Blocker)

> Implement a test suite to support long running test for Hudi writing and 
> querying end-end
> -
>
> Key: HUDI-289
> URL: https://issues.apache.org/jira/browse/HUDI-289
> Project: Apache Hudi
>  Issue Type: Test
>  Components: Usability
>Reporter: Vinoth Chandar
>Assignee: Nishith Agarwal
>Priority: Major
>  Labels: pull-request-available
>
> We would need an equivalent of an end-end test which runs some workload for 
> few hours atleast, triggers various actions like commit, deltacopmmit, 
> rollback, compaction and ensures correctness of code before every release
> P.S: Learn from all the CSS issues managing compaction..
> The feature branch is here: 
> [https://github.com/apache/incubator-hudi/tree/hudi_test_suite_refactor]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-920) Incremental view on MOR table using Spark Datasource

2020-08-10 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-920:

Status: Patch Available  (was: In Progress)

> Incremental view on MOR table using Spark Datasource
> 
>
> Key: HUDI-920
> URL: https://issues.apache.org/jira/browse/HUDI-920
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Spark Integration
>Reporter: Yanjia Gary Li
>Assignee: Yanjia Gary Li
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-920) Incremental view on MOR table using Spark Datasource

2020-08-10 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-920:

Status: In Progress  (was: Open)

> Incremental view on MOR table using Spark Datasource
> 
>
> Key: HUDI-920
> URL: https://issues.apache.org/jira/browse/HUDI-920
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Spark Integration
>Reporter: Yanjia Gary Li
>Assignee: Yanjia Gary Li
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-808) Support for cleaning source data

2020-08-10 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-808:

Status: In Progress  (was: Open)

> Support for cleaning source data
> 
>
> Key: HUDI-808
> URL: https://issues.apache.org/jira/browse/HUDI-808
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Udit Mehrotra
>Assignee: Wenning Ding
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> This is an important requirement from GDPR perspective. When performing 
> deletion on a metadata only bootstrapped partition, users should have the 
> ability to tell to clean up the original data from the source location 
> because as per this new bootstrapping mechanism the original data serves as 
> the data in original commit for Hudi.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-808) Support for cleaning source data

2020-08-10 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-808:

Status: Patch Available  (was: In Progress)

> Support for cleaning source data
> 
>
> Key: HUDI-808
> URL: https://issues.apache.org/jira/browse/HUDI-808
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Udit Mehrotra
>Assignee: Wenning Ding
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> This is an important requirement from GDPR perspective. When performing 
> deletion on a metadata only bootstrapped partition, users should have the 
> ability to tell to clean up the original data from the source location 
> because as per this new bootstrapping mechanism the original data serves as 
> the data in original commit for Hudi.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1014) Design and Implement upgrade-downgrade infrastrucutre

2020-08-10 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1014:
-
Status: Closed  (was: Patch Available)

> Design and Implement upgrade-downgrade infrastrucutre
> -
>
> Key: HUDI-1014
> URL: https://issues.apache.org/jira/browse/HUDI-1014
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Common Core, Writer Core
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1098) Marker file finalizing may block on a data file that was never written

2020-08-10 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1098:
--
Status: Closed  (was: Patch Available)

> Marker file finalizing may block on a data file that was never written
> --
>
> Key: HUDI-1098
> URL: https://issues.apache.org/jira/browse/HUDI-1098
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Writer Core
>Reporter: Vinoth Chandar
>Assignee: sivabalan narayanan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> {code:java}
> // Ensure all files in delete list is actually present. This is mandatory for 
> an eventually consistent FS. // Otherwise, we may miss deleting such files. 
> If files are not found even after retries, fail the commit 
> if (consistencyCheckEnabled) { 
>   // This will either ensure all files to be deleted are present. 
> waitForAllFiles(jsc, groupByPartition, FileVisibility.APPEAR); 
> }
> {code}
> We need to handle the case where marker file was created, but we crashed 
> before the data file was created. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-971) Fix HFileBootstrapIndexReader.getIndexedPartitions() returns unclean partition name

2020-08-10 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-971:

Status: Closed  (was: Patch Available)

> Fix HFileBootstrapIndexReader.getIndexedPartitions() returns unclean 
> partition name
> ---
>
> Key: HUDI-971
> URL: https://issues.apache.org/jira/browse/HUDI-971
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Wenning Ding
>Assignee: Wenning Ding
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> When calling HFileBootstrapIndexReader.getIndexedPartitions(), it will return 
> unclean partitions because of 
> [https://github.com/apache/hbase/blob/rel/1.2.3/hbase-common/src/main/java/org/apache/hadoop/hbase/CellUtil.java#L768].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] vinothchandar commented on a change in pull request #1945: [HUDI-1175] Minor fixes for CI flakiness

2020-08-10 Thread GitBox


vinothchandar commented on a change in pull request #1945:
URL: https://github.com/apache/hudi/pull/1945#discussion_r468251623



##
File path: hudi-integ-test/src/test/java/org/apache/hudi/integ/ITTestBase.java
##
@@ -103,22 +103,26 @@ private static String getHiveConsoleCommandFile(String 
commandFile, String addit
 if (additionalVar != null) {
   builder.append(" --hivevar " + additionalVar + " ");
 }
-return builder.append(" -f ").append(commandFile).toString();
+builder.append(" -f ").append(commandFile);
+System.out.println("Hive command : " + builder.toString());

Review comment:
   can we clean up these s.o.p? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] bvaradar commented on issue #1941: [SUPPORT] partition's value changed with hbase index

2020-08-10 Thread GitBox


bvaradar commented on issue #1941:
URL: https://github.com/apache/hudi/issues/1941#issuecomment-671645896


   @satishkotha @n3nash : Can you please chime in on this ? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] bvaradar commented on issue #1940: [SUPPORT] In CDC scenario, Does Hudi support schema enforcement like Delta Lake?

2020-08-10 Thread GitBox


bvaradar commented on issue #1940:
URL: https://github.com/apache/hudi/issues/1940#issuecomment-671645321


   Hudi supports Avro schema evolution and compatibility rules. We are also 
planning to rethink schema evolution in general for our next major release. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on a change in pull request #1945: [HUDI-1175] Minor fixes for CI flakiness

2020-08-10 Thread GitBox


nsivabalan commented on a change in pull request #1945:
URL: https://github.com/apache/hudi/pull/1945#discussion_r468245640



##
File path: 
hudi-integ-test/src/test/java/org/apache/hudi/integ/ITTestHoodieDemo.java
##
@@ -115,6 +116,7 @@ public void testParquetDemo() throws Exception {
   }
 
   private void setupDemo() throws Exception {
+Thread.sleep(60 * 1000);

Review comment:
   I didn't see this in the last 30 odd runs. So, gonna be tough to repro. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] bvaradar commented on a change in pull request #1945: [HUDI-1175] Minor fixes for CI flakiness

2020-08-10 Thread GitBox


bvaradar commented on a change in pull request #1945:
URL: https://github.com/apache/hudi/pull/1945#discussion_r468245015



##
File path: 
hudi-integ-test/src/test/java/org/apache/hudi/integ/ITTestHoodieDemo.java
##
@@ -115,6 +116,7 @@ public void testParquetDemo() throws Exception {
   }
 
   private void setupDemo() throws Exception {
+Thread.sleep(60 * 1000);

Review comment:
   @nsivabalan : I had earlier suggested this to unblock debugging. Can you 
see why data-node is coming up occasionally ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] umehrot2 commented on pull request #1944: [HUDI-1174] Changes for bootstrapped tables to work with presto

2020-08-10 Thread GitBox


umehrot2 commented on pull request #1944:
URL: https://github.com/apache/hudi/pull/1944#issuecomment-671642813


   cc @vinothchandar @bvaradar @bhasudha 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1175) Investigate CI test flakiness (hangs)

2020-08-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1175:
-
Labels: pull-request-available  (was: )

> Investigate CI test flakiness (hangs)
> -
>
> Key: HUDI-1175
> URL: https://issues.apache.org/jira/browse/HUDI-1175
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Code Cleanup
>Affects Versions: 0.6.0
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] nsivabalan opened a new pull request #1945: [HUDI-1175] Minor fixes for CI flakiness

2020-08-10 Thread GitBox


nsivabalan opened a new pull request #1945:
URL: https://github.com/apache/hudi/pull/1945


   - Adding some log statements
   - Commenting out testsuite tests from integration tests until we investigate 
CI flakiness
   
   ## What is the purpose of the pull request
   
   *This patch comments out test suite tests from integration tests. Also, adds 
some log statements*
   
   ## Brief change log
   
 - Commented out test suite tests from integration tests
 - Adding log statements before each command starts
   
   ## Verify this pull request
   
   This pull request is a trivial work.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-1175) Investigate CI test flakiness (hangs)

2020-08-10 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-1175:
-

 Summary: Investigate CI test flakiness (hangs)
 Key: HUDI-1175
 URL: https://issues.apache.org/jira/browse/HUDI-1175
 Project: Apache Hudi
  Issue Type: Bug
  Components: Code Cleanup
Affects Versions: 0.6.0
Reporter: sivabalan narayanan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-999) Parallelize listing of Source dataset partitions

2020-08-10 Thread Udit Mehrotra (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udit Mehrotra resolved HUDI-999.

Resolution: Fixed

> Parallelize listing of Source dataset partitions 
> -
>
> Key: HUDI-999
> URL: https://issues.apache.org/jira/browse/HUDI-999
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: bootstrap
>Reporter: Balaji Varadarajan
>Assignee: Udit Mehrotra
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> Currently, we are using single thread in driver to list all partitions in 
> Source dataset. This is a bottleneck when doing metadata bootstrap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-620) Hive Sync Integration of bootstrapped table

2020-08-10 Thread Udit Mehrotra (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udit Mehrotra updated HUDI-620:
---
Status: Closed  (was: Patch Available)

> Hive Sync Integration of bootstrapped table
> ---
>
> Key: HUDI-620
> URL: https://issues.apache.org/jira/browse/HUDI-620
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Hive Integration
>Reporter: Balaji Varadarajan
>Assignee: Udit Mehrotra
>Priority: Blocker
>  Time Spent: 72h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-620) Hive Sync Integration of bootstrapped table

2020-08-10 Thread Udit Mehrotra (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175092#comment-17175092
 ] 

Udit Mehrotra commented on HUDI-620:


Resolved by https://github.com/apache/hudi/pull/1702/

> Hive Sync Integration of bootstrapped table
> ---
>
> Key: HUDI-620
> URL: https://issues.apache.org/jira/browse/HUDI-620
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Hive Integration
>Reporter: Balaji Varadarajan
>Assignee: Udit Mehrotra
>Priority: Blocker
>  Time Spent: 72h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-427) Implement CLI support for performing bootstrap

2020-08-10 Thread Udit Mehrotra (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udit Mehrotra resolved HUDI-427.

Resolution: Fixed

> Implement CLI support for performing bootstrap
> --
>
> Key: HUDI-427
> URL: https://issues.apache.org/jira/browse/HUDI-427
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: CLI
>Reporter: Balaji Varadarajan
>Assignee: Wenning Ding
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 168h
>  Remaining Estimate: 0h
>
> Need CLI to perform bootstrap as described in 
> [https://cwiki.apache.org/confluence/display/HUDI/RFC+-+12+%3A+Efficient+Migration+of+Large+Parquet+Tables+to+Apache+Hudi]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-426) Implement Spark DataSource Support for querying bootstrapped tables

2020-08-10 Thread Udit Mehrotra (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udit Mehrotra resolved HUDI-426.

Resolution: Fixed

> Implement Spark DataSource Support for querying bootstrapped tables
> ---
>
> Key: HUDI-426
> URL: https://issues.apache.org/jira/browse/HUDI-426
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: Balaji Varadarajan
>Assignee: Udit Mehrotra
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We need ability in SparkDataSource to query COW table which is bootstrapped 
> as per 
> [https://cwiki.apache.org/confluence/display/HUDI/RFC+-+12+:+Efficient+Migration+of+Large+Parquet+Tables+to+Apache+Hudi#RFC-12:EfficientMigrationofLargeParquetTablestoApacheHudi-BootstrapIndex:]
>  
> Current implementation delegates to Parquet DataSource but this wont work as 
> we need ability to stitch the columns externally.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1174) Hudi changes for bootstrapped tables integration with Presto

2020-08-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1174:
-
Labels: pull-request-available  (was: )

> Hudi changes for bootstrapped tables integration with Presto
> 
>
> Key: HUDI-1174
> URL: https://issues.apache.org/jira/browse/HUDI-1174
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: bootstrap
>Reporter: Udit Mehrotra
>Assignee: Udit Mehrotra
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> Hudi changes for bootstrapped tables integration with Presto.
>  * Annotation *UseRecordReaderFromInputFormat* is required on 
> *HoodieParquetInputFormat* as well, because the reading for bootstrapped 
> tables needs to happen through record reader to be able to perform the merge. 
> On presto side, this annotation is already handled.
>  * We need to internally maintain *VIRTUAL_COLUMN_NAMES* because presto's 
> internal hive version *hive-apache-1.2.2* has *VirutalColumn* as a *class*, 
> versus the one we depend on in hudi which is an *enum*. This results in 
> following error in presto:
>  
> {noformat}
> 2020-08-10T21:59:58.957Z ERROR remote-task-callback-2 
> com.facebook.presto.execution.StageExecutionStateMachine Stage execution 
> 20200810_215953_6_34kqg.1.0 failed
> java.lang.NoSuchFieldError: VIRTUAL_COLUMN_NAMES
>  at 
> org.apache.hudi.hadoop.HoodieParquetInputFormat.lambda$getRecordReader$2(HoodieParquetInputFormat.java:201)
>  at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:174)
>  at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
>  at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
>  at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>  at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>  at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
>  at 
> org.apache.hudi.hadoop.HoodieParquetInputFormat.getRecordReader(HoodieParquetInputFormat.java:203)
>  at com.facebook.presto.hive.HiveUtil.createRecordReader(HiveUtil.java:253)
>  at 
> com.facebook.presto.hive.GenericHiveRecordCursorProvider.lambda$createRecordCursor$0(GenericHiveRecordCursorProvider.java:74)
>  at 
> com.facebook.presto.hive.authentication.UserGroupInformationUtils.lambda$executeActionInDoAs$0(UserGroupInformationUtils.java:29)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:360)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1824)
>  at 
> com.facebook.presto.hive.authentication.UserGroupInformationUtils.executeActionInDoAs(UserGroupInformationUtils.java:27)
>  at 
> com.facebook.presto.hive.authentication.ImpersonatingHdfsAuthentication.doAs(ImpersonatingHdfsAuthentication.java:39)
>  at com.facebook.presto.hive.HdfsEnvironment.doAs(HdfsEnvironment.java:82)
>  at 
> com.facebook.presto.hive.GenericHiveRecordCursorProvider.createRecordCursor(GenericHiveRecordCursorProvider.java:73)
>  at 
> com.facebook.presto.hive.HivePageSourceProvider.createHivePageSource(HivePageSourceProvider.java:374)
>  at 
> com.facebook.presto.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:137)
>  at 
> com.facebook.presto.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:113)
>  at 
> com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:52)
> {noformat}
>  
>  * Dependency changes in *hudi-presto-bundle* to avoid runtime exceptions.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] umehrot2 opened a new pull request #1944: [HUDI-1174] Changes for bootstrapped tables to work with presto

2020-08-10 Thread GitBox


umehrot2 opened a new pull request #1944:
URL: https://github.com/apache/hudi/pull/1944


   ## What is the purpose of the pull request
   
   The purpose of this pull request is to implement changes required on Hudi 
side to get Bootstrapped tables integrated with Presto. The testing was done 
against **presto 0.232** and following changes were identified to make it work:
   
   - Annotation **UseRecordReaderFromInputFormat** is required on 
**HoodieParquetInputFormat** as well, because the reading for bootstrapped 
tables needs to happen through record reader to be able to perform the merge. 
On presto side, this annotation is already handled.
   
   - We need to internally maintain `VIRTUAL_COLUMN_NAMES` because presto's 
internal hive version **hive-apache-1.2.2** has `VirutalColumn` as a class, 
versus the one we depend on in hudi which is an **enum**. This results in 
following error in presto:
   ```
   2020-08-10T21:59:58.957Z ERROR   remote-task-callback-2  
com.facebook.presto.execution.StageExecutionStateMachineStage execution 
20200810_215953_6_34kqg.1.0 failed
   java.lang.NoSuchFieldError: VIRTUAL_COLUMN_NAMES
at 
org.apache.hudi.hadoop.HoodieParquetInputFormat.lambda$getRecordReader$2(HoodieParquetInputFormat.java:201)
at 
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:174)
at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at 
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at 
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
at 
org.apache.hudi.hadoop.HoodieParquetInputFormat.getRecordReader(HoodieParquetInputFormat.java:203)
at 
com.facebook.presto.hive.HiveUtil.createRecordReader(HiveUtil.java:253)
at 
com.facebook.presto.hive.GenericHiveRecordCursorProvider.lambda$createRecordCursor$0(GenericHiveRecordCursorProvider.java:74)
at 
com.facebook.presto.hive.authentication.UserGroupInformationUtils.lambda$executeActionInDoAs$0(UserGroupInformationUtils.java:29)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:360)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1824)
at 
com.facebook.presto.hive.authentication.UserGroupInformationUtils.executeActionInDoAs(UserGroupInformationUtils.java:27)
at 
com.facebook.presto.hive.authentication.ImpersonatingHdfsAuthentication.doAs(ImpersonatingHdfsAuthentication.java:39)
at 
com.facebook.presto.hive.HdfsEnvironment.doAs(HdfsEnvironment.java:82)
at 
com.facebook.presto.hive.GenericHiveRecordCursorProvider.createRecordCursor(GenericHiveRecordCursorProvider.java:73)
at 
com.facebook.presto.hive.HivePageSourceProvider.createHivePageSource(HivePageSourceProvider.java:374)
at 
com.facebook.presto.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:137)
at 
com.facebook.presto.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:113)
at 
com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:52)
   ```
   - Dependency changes in `hudi-presto-bundle` to avoid runtime exceptions.
   
   ## Brief change log
   
   ## Verify this pull request
   
   The changes have been tested on **emr-5.30.1** against **presto 0.232**.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-1174) Hudi changes for bootstrapped tables integration with Presto

2020-08-10 Thread Udit Mehrotra (Jira)
Udit Mehrotra created HUDI-1174:
---

 Summary: Hudi changes for bootstrapped tables integration with 
Presto
 Key: HUDI-1174
 URL: https://issues.apache.org/jira/browse/HUDI-1174
 Project: Apache Hudi
  Issue Type: Sub-task
  Components: bootstrap
Reporter: Udit Mehrotra
Assignee: Udit Mehrotra
 Fix For: 0.6.0


Hudi changes for bootstrapped tables integration with Presto.
 * Annotation *UseRecordReaderFromInputFormat* is required on 
*HoodieParquetInputFormat* as well, because the reading for bootstrapped tables 
needs to happen through record reader to be able to perform the merge. On 
presto side, this annotation is already handled.
 * We need to internally maintain *VIRTUAL_COLUMN_NAMES* because presto's 
internal hive version *hive-apache-1.2.2* has *VirutalColumn* as a *class*, 
versus the one we depend on in hudi which is an *enum*. This results in 
following error in presto:

 
{noformat}
2020-08-10T21:59:58.957Z ERROR remote-task-callback-2 
com.facebook.presto.execution.StageExecutionStateMachine Stage execution 
20200810_215953_6_34kqg.1.0 failed
java.lang.NoSuchFieldError: VIRTUAL_COLUMN_NAMES
 at 
org.apache.hudi.hadoop.HoodieParquetInputFormat.lambda$getRecordReader$2(HoodieParquetInputFormat.java:201)
 at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:174)
 at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
 at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
 at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
 at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
 at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
 at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
 at 
org.apache.hudi.hadoop.HoodieParquetInputFormat.getRecordReader(HoodieParquetInputFormat.java:203)
 at com.facebook.presto.hive.HiveUtil.createRecordReader(HiveUtil.java:253)
 at 
com.facebook.presto.hive.GenericHiveRecordCursorProvider.lambda$createRecordCursor$0(GenericHiveRecordCursorProvider.java:74)
 at 
com.facebook.presto.hive.authentication.UserGroupInformationUtils.lambda$executeActionInDoAs$0(UserGroupInformationUtils.java:29)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:360)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1824)
 at 
com.facebook.presto.hive.authentication.UserGroupInformationUtils.executeActionInDoAs(UserGroupInformationUtils.java:27)
 at 
com.facebook.presto.hive.authentication.ImpersonatingHdfsAuthentication.doAs(ImpersonatingHdfsAuthentication.java:39)
 at com.facebook.presto.hive.HdfsEnvironment.doAs(HdfsEnvironment.java:82)
 at 
com.facebook.presto.hive.GenericHiveRecordCursorProvider.createRecordCursor(GenericHiveRecordCursorProvider.java:73)
 at 
com.facebook.presto.hive.HivePageSourceProvider.createHivePageSource(HivePageSourceProvider.java:374)
 at 
com.facebook.presto.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:137)
 at 
com.facebook.presto.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:113)
 at 
com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:52)
{noformat}
 
 * Dependency changes in *hudi-presto-bundle* to avoid runtime exceptions.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] steveloughran commented on issue #1837: [SUPPORT]S3 file listing causing compaction to get eventually slow

2020-08-10 Thread GitBox


steveloughran commented on issue #1837:
URL: https://github.com/apache/hudi/issues/1837#issuecomment-671523549


   The issue here is that treewalking is pathologically bad for S3. Asking for 
a deep listing is often more efficient; filesystem.listFiles(path, 
recursive=true) will do this



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] jpugliesi edited a comment on issue #1925: [SUPPORT] Support for Confluent Cloud SchemaRegistryProvider

2020-08-10 Thread GitBox


jpugliesi edited a comment on issue #1925:
URL: https://github.com/apache/hudi/issues/1925#issuecomment-671522522


   @bvaradar looks like this works - thanks for your help.
   
   One follow up question about this `SchemaRegistryProvider` - is it possible 
to configure Hudi to use this `SchemaProvider` (specifically 
`SchemaRegistryProvider`) with the Datasource Writer (Spark) API to assert 
source/target schemas as well? (Happy to create a new ticket to track this 
conversation if need be)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] jpugliesi commented on issue #1925: [SUPPORT] Support for Confluent Cloud SchemaRegistryProvider

2020-08-10 Thread GitBox


jpugliesi commented on issue #1925:
URL: https://github.com/apache/hudi/issues/1925#issuecomment-671522522


   @bvaradar looks like this works - thanks for your help.
   
   One follow up question about this `SchemaRegistryProvider` - is it possible 
to configure Hudi to use this `SchemaProvider` (specifically 
`SchemaRegistryProvider`) with the Datasource Writer (Spark) API to assert 
source/target schemas as well? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] tooptoop4 edited a comment on issue #1813: ERROR HoodieDeltaStreamer: Got error running delta sync once.

2020-08-10 Thread GitBox


tooptoop4 edited a comment on issue #1813:
URL: https://github.com/apache/hudi/issues/1813#issuecomment-671512191


   @bhasudha how does checkpointing work here? ie after some time of running 
DeltaStreamer job i need to stop the DeltaStreamer job, destroy old EC2, launch 
new EC2, restart DeltaStreamer job. How does DeltaStreamer job know to skip 
some of the raw change capture parquets (that were already processed into Hudi 
table) and resume from certain point of them?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] tooptoop4 commented on issue #1813: ERROR HoodieDeltaStreamer: Got error running delta sync once.

2020-08-10 Thread GitBox


tooptoop4 commented on issue #1813:
URL: https://github.com/apache/hudi/issues/1813#issuecomment-671512191


   @bhasudha how does checkpointing work here? ie after some time of running 
DeltaStreamer job i need to stop the DeltaStreamer job, destroy old EC2, launch 
new EC2, restart DeltaStreamer job. How does DeltaStreamer job know to skip 
some of the raw change capture parquets and resume from certain point of them?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zhedoubushishi commented on a change in pull request #1870: [HUDI-808] Support cleaning bootstrap source data

2020-08-10 Thread GitBox


zhedoubushishi commented on a change in pull request #1870:
URL: https://github.com/apache/hudi/pull/1870#discussion_r468069699



##
File path: hudi-common/src/main/java/org/apache/hudi/common/HoodieCleanStat.java
##
@@ -39,17 +40,34 @@
   private final List successDeleteFiles;
   // Files that could not be deleted
   private final List failedDeleteFiles;
+  // Boostrap Base Path patterns that were generated for the delete operation

Review comment:
   [typo] boostrap -> bootstrap

##
File path: hudi-common/src/main/java/org/apache/hudi/common/HoodieCleanStat.java
##
@@ -18,6 +18,7 @@
 
 package org.apache.hudi.common;
 
+import java.util.ArrayList;

Review comment:
   [nit] wrong line

##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/versioning/clean/CleanPlanV2MigrationHandler.java
##
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.table.timeline.versioning.clean;
+
+import java.util.HashMap;

Review comment:
   [nit] wrong line

##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/CleanerUtils.java
##
@@ -26,39 +26,49 @@
 import org.apache.hudi.common.table.timeline.HoodieInstant;
 import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
 import 
org.apache.hudi.common.table.timeline.versioning.clean.CleanMetadataMigrator;
-import 
org.apache.hudi.common.table.timeline.versioning.clean.CleanV1MigrationHandler;
-import 
org.apache.hudi.common.table.timeline.versioning.clean.CleanV2MigrationHandler;
+import 
org.apache.hudi.common.table.timeline.versioning.clean.CleanMetadataV1MigrationHandler;
+import 
org.apache.hudi.common.table.timeline.versioning.clean.CleanMetadataV2MigrationHandler;
 
 import java.io.IOException;
 import java.util.HashMap;
 import java.util.List;
 import java.util.Map;
+import 
org.apache.hudi.common.table.timeline.versioning.clean.CleanPlanMigrator;

Review comment:
   [nit] wrong line





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] wfhartford opened a new issue #1943: [SUPPORT] Gradle fails with dependency on org.apache.hudi:hudi-spark_2.12:0.5.3

2020-08-10 Thread GitBox


wfhartford opened a new issue #1943:
URL: https://github.com/apache/hudi/issues/1943


   Using the `hudi-spark_2.12` artifact as a dependency in gradle fails with 
the following error:
   ```
   inconsistent module metadata found. Descriptor: 
org.apache.hudi:hudi-spark_2.11:0.5.3 Errors: bad module name: 
expected='hudi-spark_2.12' found='hudi-spark_2.11'
   ```
   
   The `pom.xml` file for `hudi-spark_2.12` has an artifactId element 
containing `hudi-spark_${scala.binary.version}`. Looking at the parent pom 
file, we find that the property `scala.binary.version` has a value of `2.11`. I 
believe that this is the source of the error. Gradle seems to be checking that 
the artifact ID in the pom file matches the artifact that it was trying to 
download, and finds an inconsistency. Maven does not seem to mind this 
inconsistency.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Clone this git repository: 
https://github.com/wfhartford/gradle-hudi-inconsistent-metadata
   2. Notice the very basic `build.gradle` file with a single dependency: 
`org.apache.hudi:hudi-spark_2.12:0.5.3`,
   3. Build the project with `./gradlew build`,
   4. The build fails with the error message above.
   
   **Expected behavior**
   
   Gradle downloads the HUDI dependency and builds the project.
   
   **Environment Description**
   
   * Hudi version : 0.5.3
   * Spark version : N/A
   * Hive version : N/A
   * Hadoop version : N/A
   * Storage (HDFS/S3/GCS..) : N/A
   * Running on Docker? (yes/no) : No
   
   **Additional context**
   
   I would like to use the non-bundled artifact because the bundled artifact 
`org.apache.hudi:hudi-spark-bundle_2.12:0.5.3` bundles in old versions of the 
kotlin standard library, which conflict with the up-to-date version I need in 
my project. I've worked around this issue by building HUDI from source and 
editing the main pom.xml file to use the version of kotlin which matches my 
project.
   
   **Full output from `./gradlew build`**
   
   ```
   > Task :compileJava FAILED
   
   FAILURE: Build failed with an exception.
   
   * What went wrong:
   Execution failed for task ':compileJava'.
   > Could not resolve all files for configuration ':compileClasspath'.
  > Could not resolve org.apache.hudi:hudi-spark_2.12:0.5.3.
Required by:
project :
 > Could not resolve org.apache.hudi:hudi-spark_2.12:0.5.3.
> inconsistent module metadata found. Descriptor: 
org.apache.hudi:hudi-spark_2.11:0.5.3 Errors: bad module name: 
expected='hudi-spark_2.12' found='hudi-spark_2.11'
   
   * Try:
   Run with --stacktrace option to get the stack trace. Run with --info or 
--debug option to get more log output. Run with --scan to get full insights.
   
   * Get more help at https://help.gradle.org
   
   Deprecated Gradle features were used in this build, making it incompatible 
with Gradle 7.0.
   Use '--warning-mode all' to show the individual deprecation warnings.
   See 
https://docs.gradle.org/6.5.1/userguide/command_line_interface.html#sec:command_line_warnings
   
   BUILD FAILED in 953ms
   1 actionable task: 1 executed
   ```
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1153) Spark DataSource and Streaming Write must fail when operation type is misconfigured

2020-08-10 Thread Sreeram Ramji (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sreeram Ramji updated HUDI-1153:

Status: In Progress  (was: Open)

> Spark DataSource and Streaming Write must fail when operation type is 
> misconfigured
> ---
>
> Key: HUDI-1153
> URL: https://issues.apache.org/jira/browse/HUDI-1153
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: Balaji Varadarajan
>Assignee: Sreeram Ramji
>Priority: Major
> Fix For: 0.6.1
>
>
> Context: [https://github.com/apache/hudi/issues/1902#issuecomment-669698259]
>  
> If you look at DataSourceUtils.java, 
> [https://github.com/apache/hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java#L257]
>  
> we are string comparison to determine operation type which is a bad idea and 
> a typo could result in "upsert" being used silently. 
>  
> Just like 
> [https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java#L187]
>  being used for DeltaStreamer, we need similar enums defined in 
> DataSourceOptions.scala for OPERATION_OPT_KEY but care must be taken to 
> ensure we do not cause backwards compatibility issue by changing the property 
> value. In other words, we need to retain the lower case values 
> ("bulk_insert", "insert" and "upsert") but make it an enum. 
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] xushiyan commented on a change in pull request #1871: [HUDI-781] Introduce HoodieTestTable for test preparation

2020-08-10 Thread GitBox


xushiyan commented on a change in pull request #1871:
URL: https://github.com/apache/hudi/pull/1871#discussion_r468032133



##
File path: 
hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestUtils.java
##
@@ -237,11 +229,12 @@ public static void 
createPendingCleanFiles(HoodieTableMetaClient metaClient, Str
 
   public static void createCorruptedPendingCleanFiles(HoodieTableMetaClient 
metaClient, String commitTime) {
 Arrays.asList(HoodieTimeline.makeRequestedCleanerFileName(commitTime),
-HoodieTimeline.makeInflightCleanerFileName(commitTime)).forEach(f -> {
+HoodieTimeline.makeInflightCleanerFileName(commitTime))
+.forEach(f -> {
   FSDataOutputStream os = null;
   try {
 Path commitFile = new Path(
-metaClient.getBasePath() + "/" + 
HoodieTableMetaClient.METAFOLDER_NAME + "/" + f);
+metaClient.getBasePath() + "/" + 
HoodieTableMetaClient.METAFOLDER_NAME + "/" + f);

Review comment:
   sure, fixed 2 of this. Please note that, as this is from the original 
codebase, i didn't change all of this kind of usage in this file. I think it 
could be a good chance to change while moving the APIs to `HoodieTestTable`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xushiyan commented on a change in pull request #1871: [HUDI-781] Introduce HoodieTestTable for test preparation

2020-08-10 Thread GitBox


xushiyan commented on a change in pull request #1871:
URL: https://github.com/apache/hudi/pull/1871#discussion_r468021630



##
File path: 
hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestUtils.java
##
@@ -237,11 +229,12 @@ public static void 
createPendingCleanFiles(HoodieTableMetaClient metaClient, Str
 
   public static void createCorruptedPendingCleanFiles(HoodieTableMetaClient 
metaClient, String commitTime) {
 Arrays.asList(HoodieTimeline.makeRequestedCleanerFileName(commitTime),
-HoodieTimeline.makeInflightCleanerFileName(commitTime)).forEach(f -> {
+HoodieTimeline.makeInflightCleanerFileName(commitTime))
+.forEach(f -> {

Review comment:
   intellij will force some weird indentation that breaks checkstyle 
rule...this is the easiest way I can find to prevent that





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on pull request #1870: [HUDI-808] Support cleaning bootstrap source data

2020-08-10 Thread GitBox


vinothchandar commented on pull request #1870:
URL: https://github.com/apache/hudi/pull/1870#issuecomment-671401411


   @zhedoubushishi @umehrot2 please review this PR carefully ! We plan to land 
today 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on a change in pull request #1870: [HUDI-808] Support cleaning bootstrap source data

2020-08-10 Thread GitBox


vinothchandar commented on a change in pull request #1870:
URL: https://github.com/apache/hudi/pull/1870#discussion_r467944196



##
File path: 
hudi-client/src/main/java/org/apache/hudi/table/action/clean/CleanActionExecutor.java
##
@@ -82,40 +83,45 @@ HoodieCleanerPlan requestClean(JavaSparkContext jsc) {
   LOG.info("Using cleanerParallelism: " + cleanerParallelism);
 
   jsc.setJobGroup(this.getClass().getSimpleName(), "Generates list of file 
slices to be cleaned");
-  Map> cleanOps = jsc
+  Map> cleanOps = jsc
   .parallelize(partitionsToClean, cleanerParallelism)
   .map(partitionPathToClean -> Pair.of(partitionPathToClean, 
planner.getDeletePaths(partitionPathToClean)))
   .collect().stream()
-  .collect(Collectors.toMap(Pair::getKey, Pair::getValue));
+  .collect(Collectors.toMap(Pair::getKey,

Review comment:
   stylistic: in general, a stream within stream is a bit hard to read. 
`flatMap()` first? but guess this is a map. probably using a named lambda 
function may help 

##
File path: 
hudi-client/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java
##
@@ -52,6 +52,8 @@
   public static final String MAX_COMMITS_TO_KEEP_PROP = 
"hoodie.keep.max.commits";
   public static final String MIN_COMMITS_TO_KEEP_PROP = 
"hoodie.keep.min.commits";
   public static final String COMMITS_ARCHIVAL_BATCH_SIZE_PROP = 
"hoodie.commits.archival.batch";
+  // Set true to clean bootstrap source files when necessary
+  public static final String CLEANER_BOOTSTRAP_BASE_FILE_ENABLED = 
"hoodie.cleaner.bootstrap.base.file";

Review comment:
   rename : `hoodie.cleaner.delete.bootstrap.base.file` ?

##
File path: hudi-common/src/main/java/org/apache/hudi/common/HoodieCleanStat.java
##
@@ -39,17 +40,34 @@
   private final List successDeleteFiles;
   // Files that could not be deleted
   private final List failedDeleteFiles;
+  // Boostrap Base Path patterns that were generated for the delete operation
+  private final List deleteBootstrapBasePathPatterns;
+  private final List successDeleteBootstrapBaseFiles;
+  // Files that could not be deleted
+  private final List failedDeleteBootstrapBaseFiles;
   // Earliest commit that was retained in this clean
   private final String earliestCommitToRetain;
 
   public HoodieCleanStat(HoodieCleaningPolicy policy, String partitionPath, 
List deletePathPatterns,
   List successDeleteFiles, List failedDeleteFiles, String 
earliestCommitToRetain) {
+this(policy, partitionPath, deletePathPatterns, successDeleteFiles, 
failedDeleteFiles, earliestCommitToRetain,
+new ArrayList<>(), new ArrayList<>(), new ArrayList<>());

Review comment:
   CollectionUtils.emptyList or something? 

##
File path: 
hudi-client/src/main/java/org/apache/hudi/table/action/clean/CleanActionExecutor.java
##
@@ -82,40 +83,45 @@ HoodieCleanerPlan requestClean(JavaSparkContext jsc) {
   LOG.info("Using cleanerParallelism: " + cleanerParallelism);
 
   jsc.setJobGroup(this.getClass().getSimpleName(), "Generates list of file 
slices to be cleaned");
-  Map> cleanOps = jsc
+  Map> cleanOps = jsc
   .parallelize(partitionsToClean, cleanerParallelism)
   .map(partitionPathToClean -> Pair.of(partitionPathToClean, 
planner.getDeletePaths(partitionPathToClean)))
   .collect().stream()
-  .collect(Collectors.toMap(Pair::getKey, Pair::getValue));
+  .collect(Collectors.toMap(Pair::getKey,
+(y) -> 
y.getValue().stream().map(CleanFileInfo::toHoodieFileCleanInfo).collect(Collectors.toList(;
 
   return new HoodieCleanerPlan(earliestInstant
   .map(x -> new HoodieActionInstant(x.getTimestamp(), x.getAction(), 
x.getState().name())).orElse(null),
-  config.getCleanerPolicy().name(), cleanOps, 1);
+  config.getCleanerPolicy().name(), null, 
CleanPlanner.LATEST_CLEAN_PLAN_VERSION, cleanOps);
 } catch (IOException e) {
   throw new HoodieIOException("Failed to schedule clean operation", e);
 }
   }
 
-  private static PairFlatMapFunction>, String, 
PartitionCleanStat> deleteFilesFunc(
-  HoodieTable table) {
-return (PairFlatMapFunction>, String, 
PartitionCleanStat>) iter -> {
+  private static PairFlatMapFunction>, 
String, PartitionCleanStat>
+deleteFilesFunc(HoodieTable table) {
+return (PairFlatMapFunction>, 
String, PartitionCleanStat>) iter -> {
   Map partitionCleanStatMap = new HashMap<>();
-
   FileSystem fs = table.getMetaClient().getFs();
-  Path basePath = new Path(table.getMetaClient().getBasePath());
   while (iter.hasNext()) {
-Tuple2 partitionDelFileTuple = iter.next();
+Tuple2 partitionDelFileTuple = iter.next();
 String partitionPath = partitionDelFileTuple._1();
-String delFileName = partitionDelFileTuple._2();
-Path deletePath = 
FSUtils.getPartitionPath(FSUtils.get

[jira] [Updated] (HUDI-1173) fix hudi-prometheus pom dependency

2020-08-10 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-1173:

Priority: Blocker  (was: Minor)

> fix hudi-prometheus pom dependency
> --
>
> Key: HUDI-1173
> URL: https://issues.apache.org/jira/browse/HUDI-1173
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: liujinhui
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1173) fix hudi-prometheus pom dependency

2020-08-10 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-1173:

Fix Version/s: 0.6.0

> fix hudi-prometheus pom dependency
> --
>
> Key: HUDI-1173
> URL: https://issues.apache.org/jira/browse/HUDI-1173
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: liujinhui
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] UZi5136225 commented on pull request #1931: [HUDI-210] hudi-support-prometheus-pushgateway

2020-08-10 Thread GitBox


UZi5136225 commented on pull request #1931:
URL: https://github.com/apache/hudi/pull/1931#issuecomment-671354955


   Yes, this configuration needs to be added



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] sbernauer edited a comment on pull request #1931: [HUDI-210] hudi-support-prometheus-pushgateway

2020-08-10 Thread GitBox


sbernauer edited a comment on pull request #1931:
URL: https://github.com/apache/hudi/pull/1931#issuecomment-671353265


   @UZi5136225 i just noticed that i get a `java.lang.ClassNotFoundException: 
io.prometheus.client.exporter.common.TextFormat` when trying to access the 
interface. I think we should also add (i will give it a try)
   ```
   io.prometheus:simpleclient_common
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] sbernauer edited a comment on pull request #1931: [HUDI-210] hudi-support-prometheus-pushgateway

2020-08-10 Thread GitBox


sbernauer edited a comment on pull request #1931:
URL: https://github.com/apache/hudi/pull/1931#issuecomment-671353265


   @UZi5136225 i just noticed that i get a `java.lang.ClassNotFoundException: 
io.prometheus.client.exporter.common.TextFormat` when trying to access the 
interface. I think we should also add
   ```
   io.prometheus:simpleclient_common
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] sbernauer commented on pull request #1931: [HUDI-210] hudi-support-prometheus-pushgateway

2020-08-10 Thread GitBox


sbernauer commented on pull request #1931:
URL: https://github.com/apache/hudi/pull/1931#issuecomment-671353265


   @UZi5136225 i just noticed taht i get a `java.lang.ClassNotFoundException: 
io.prometheus.client.exporter.common.TextFormat` when trying to access the 
interface. I think we should also add
   ```
   io.prometheus:simpleclient_common
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] sbernauer commented on pull request #1931: [HUDI-210] hudi-support-prometheus-pushgateway

2020-08-10 Thread GitBox


sbernauer commented on pull request #1931:
URL: https://github.com/apache/hudi/pull/1931#issuecomment-671350777


   Thanks @UZi5136225, this fixed the problem!
   I dont understand why, because i added all the libs below in the correct 
version to the classpath. But it works :)
   ```
   mvn dependency:tree | grep simpleclient
   [INFO] +- io.prometheus:simpleclient:jar:0.8.0:compile
   [INFO] +- io.prometheus:simpleclient_httpserver:jar:0.8.0:compile
   [INFO] |  \- io.prometheus:simpleclient_common:jar:0.8.0:compile
   [INFO] +- io.prometheus:simpleclient_dropwizard:jar:0.8.0:compile
   [INFO] +- io.prometheus:simpleclient_pushgateway:jar:0.8.0:compile
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] UZi5136225 commented on pull request #1931: [HUDI-210] hudi-support-prometheus-pushgateway

2020-08-10 Thread GitBox


UZi5136225 commented on pull request #1931:
URL: https://github.com/apache/hudi/pull/1931#issuecomment-671336381


   https://github.com/apache/hudi/pull/1942 @sbernauer   You can try this PR



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] sbernauer commented on pull request #1931: [HUDI-210] hudi-support-prometheus-pushgateway

2020-08-10 Thread GitBox


sbernauer commented on pull request #1931:
URL: https://github.com/apache/hudi/pull/1931#issuecomment-671334282


   Thanks a lot @UZi5136225 for your fast response!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1173) fix hudi-prometheus pom dependency

2020-08-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1173:
-
Labels: pull-request-available  (was: )

> fix hudi-prometheus pom dependency
> --
>
> Key: HUDI-1173
> URL: https://issues.apache.org/jira/browse/HUDI-1173
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: liujinhui
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] UZi5136225 opened a new pull request #1942: [HUDI-1173] fix hudi-prometheus pom dependency

2020-08-10 Thread GitBox


UZi5136225 opened a new pull request #1942:
URL: https://github.com/apache/hudi/pull/1942


   fix hudi-prometheus pom dependency
   
   ## What is the purpose of the pull request
   
   fix hudi-prometheus pom dependency 
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] UZi5136225 commented on pull request #1931: [HUDI-210] hudi-support-prometheus-pushgateway

2020-08-10 Thread GitBox


UZi5136225 commented on pull request #1931:
URL: https://github.com/apache/hudi/pull/1931#issuecomment-671331840


   It’s because I missed to configure less dependencies, it will be fixed soon
   @sbernauer 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-1173) fix hudi-prometheus pom dependency

2020-08-10 Thread liujinhui (Jira)
liujinhui created HUDI-1173:
---

 Summary: fix hudi-prometheus pom dependency
 Key: HUDI-1173
 URL: https://issues.apache.org/jira/browse/HUDI-1173
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: liujinhui






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] sbernauer edited a comment on pull request #1931: [HUDI-210] hudi-support-prometheus-pushgateway

2020-08-10 Thread GitBox


sbernauer edited a comment on pull request #1931:
URL: https://github.com/apache/hudi/pull/1931#issuecomment-671304318


   I don't use the include option but instead guarantee that all drivers and 
executors have the libs and correctly use them via SPARK_DIST_CLASSPATH.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] sbernauer commented on pull request #1931: [HUDI-210] hudi-support-prometheus-pushgateway

2020-08-10 Thread GitBox


sbernauer commented on pull request #1931:
URL: https://github.com/apache/hudi/pull/1931#issuecomment-671304318


   I don't use the include, option but instead guarantee that all drivers and 
executors have the libs and correctly use them via SPARK_DIST_CLASSPATH.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   >