[jira] [Updated] (HUDI-1392) lose partition info when using spark parameter "basePath"

2020-11-24 Thread steven zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

steven zhang updated HUDI-1392:
---
Description: 
Reproduce the issue with below steps:

        set hoodie.datasource.write.hive_style_partitioning->true

        spark.read().format("org.apache.hudi").option("mergeSchema", 
true).option("basePath",tablePath).load(tablePath + (nonPartitionedTable ? "/*" 
: "/*")).createOrReplaceTempView(hudiTable);

        spark.sql("select * from hudiTable where date>'20200807'").explain();

        print PartitionFilters: []

 the reason is: 

step 1. spark  read datasource  
(https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 L 317)

 

          case (dataSource: RelationProvider, None) => 
dataSource.createRelation(sparkSession.sqlContext, caseInsensitiveOptions)  
//caseInsensitiveOptions CaseInsensitiveMap type

 

step 2. hudi  create relation

         org.apache.hudi.DefaultSource#createRelation(sqlContext: 
SQLContext,optParams: Map[String, String],schema: StructType): BaseRelation = {

 

         // the type optParams is CaseInsensitiveMap. and parameters type will 
be converted to Map thought Map ++

         val parameters = Map(QUERY_TYPE_OPT_KEY -> DEFAULT_QUERY_TYPE_OPT_VAL) 
++ translateViewTypesToQueryTypes(optParams)

 

step 3. hudi  transform to parquet relation if we query table(cow type) data

         then it will call getBaseFileOnlyView(sqlContext, parameters, schema, 
readPaths, isBootstrappedTable, globPaths, metaClient)

 

it will create new Datasource and relation instance with : 
DataSource.apply(sparkSession = sqlContext.sparkSession,paths = 
extraReadPaths,userSpecifiedSchema = Option(schema),className = 
"parquet",options = optParams).resolveRelation()

 

step 4. spark fetch basePath for infer partition info 
(https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala
 L196)

           //the parameters come from DataSource #options (map type)

          parameters.get(BASE_PATH_PARAM)

          so parameters.get(BASE_PATH_PARAM) will call Map#get not 
CaseInsensitiveMap#get. and parameters stored “bathpath” . get “bathPath” will 
return None

this is a spark bug (fixed at 3.0.1 version 
https://issues.apache.org/jira/browse/SPARK-32368) hudi current used spark 
v2.4.4

in order to avoid this spark issure  a simple solution is we can not convert 
the input optParams type(spark already make it  CaseInsensitiveMap type) in 
org.apache.hudi.DefaultSource#createRelation(sqlContext: SQLContext,optParams: 
Map[String, String]…

  

  was:
Reproduce the issue with below steps:

        set hoodie.datasource.write.hive_style_partitioning->true

        spark.read().format("org.apache.hudi").option("mergeSchema", 
true).option("basePath",tablePath).load(tablePath + (nonPartitionedTable ? "/*" 
: "/*")).createOrReplaceTempView(hudiTable);

        spark.sql("select * from hudiTable where date>'20200807'").explain();

        print PartitionFilters: []

the cause of this issue is org.apache.hudi.DefaultSource#createRelation is call 
by dataSource.createRelation(sparkSession.sqlContext, 
caseInsensitiveOptions)([https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala]
 L318)

the input optParams is CaseInsensitiveMap type. hudi attached additional 
parameters such as

val parameters = Map(QUERY_TYPE_OPT_KEY -> DEFAULT_QUERY_TYPE_OPT_VAL) ++ 
translateViewTypesToQueryTypes(optParams)

the parameters  type has been converted Map not CaseInsensitiveMap

parquet datasource infer Partition info will fetch basePath value thought 
parameters.get(BASE_PATH_PARAM) (  
[https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala]
 L196) then the get method will not call CaseInsensitiveMap#get. just call 
Map#get("bathPath") and return None. so it will cause infer nothing partition 
info.

and i found spark 2.4.7 version above ( 
https://issues.apache.org/jira/browse/SPARK-32364 ) has use caseInsensitiveMap 
to fetch basePath although the intention of it is not same as this hudi issue. 
and the lower spark version also has this issue.

so  we need using 

val parameters = translateViewTypesToQueryTypes(optParams) ++ 
Map(QUERY_TYPE_OPT_KEY -> DEFAULT_QUERY_TYPE_OPT_VAL)

for two reason: 1.lower spark version also has this issue  2. original type 
converted

  


> lose partition info when using spark parameter "basePath" 
> --
>
> Key: HUDI-1392
> URL: https://issues.apache.org/jira/browse/HUDI-1392
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark 

[GitHub] [hudi] quitozang closed issue #2274: [SUPPORT]

2020-11-24 Thread GitBox


quitozang closed issue #2274:
URL: https://github.com/apache/hudi/issues/2274


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] shenh062326 commented on pull request #2222: [HUDI-1364] Add HoodieJavaEngineContext to hudi-java-client

2020-11-24 Thread GitBox


shenh062326 commented on pull request #:
URL: https://github.com/apache/hudi/pull/#issuecomment-733449190


   > @shenh062326 are you planning to follow on with a full impl of a java 
based client? Changes LGTM.
   
   Yes, I will add a full impl of a java based client.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-1416) [Documentation] Documentation is confusing

2020-11-24 Thread Hemanga Borah (Jira)
Hemanga Borah created HUDI-1416:
---

 Summary: [Documentation] Documentation is confusing
 Key: HUDI-1416
 URL: https://issues.apache.org/jira/browse/HUDI-1416
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Docs
Reporter: Hemanga Borah


Doc: [https://hudi.apache.org/docs/concepts.html#merge-on-read-table]

The doc says, "Merge on read table is a superset of copy on write, in the sense 
it still supports read optimized queries of the table by exposing only the 
base/columnar files in latest file slices." 

However, above in the table 
(https://hudi.apache.org/docs/concepts.html#table-types--queries), it is 
mentioned that only "Merge On Read" supports "Read Optimized Queries".

 

Another way of writing this would be:

"Merge on read table is a superset of copy on write, in the sense that it 
*additionally* supports read optimized queries of the table by exposing only 
the base/columnar files in latest file slices."



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] garyli1019 merged pull request #2243: HUDI-1392 lose partition info when using spark parameter basePath

2020-11-24 Thread GitBox


garyli1019 merged pull request #2243:
URL: https://github.com/apache/hudi/pull/2243


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated: [HUDI-1392] lose partition info when using spark parameter basePath (#2243)

2020-11-24 Thread garyli
This is an automated email from the ASF dual-hosted git repository.

garyli pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 56866a1  [HUDI-1392] lose partition info when using spark parameter 
basePath (#2243)
56866a1 is described below

commit 56866a11fe8b7a0ef8340f221da30c83c72b85da
Author: steven zhang 
AuthorDate: Wed Nov 25 11:55:33 2020 +0800

[HUDI-1392] lose partition info when using spark parameter basePath (#2243)

Co-authored-by: zhang wen 
---
 .../src/main/scala/org/apache/hudi/DataSourceOptions.scala | 10 +++---
 hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala  |  2 +-
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/hudi-spark/src/main/scala/org/apache/hudi/DataSourceOptions.scala 
b/hudi-spark/src/main/scala/org/apache/hudi/DataSourceOptions.scala
index fc52b38..73f70e7 100644
--- a/hudi-spark/src/main/scala/org/apache/hudi/DataSourceOptions.scala
+++ b/hudi-spark/src/main/scala/org/apache/hudi/DataSourceOptions.scala
@@ -81,9 +81,13 @@ object DataSourceReadOptions {
 val translation = Map(VIEW_TYPE_READ_OPTIMIZED_OPT_VAL -> 
QUERY_TYPE_SNAPSHOT_OPT_VAL,
   VIEW_TYPE_INCREMENTAL_OPT_VAL -> 
QUERY_TYPE_INCREMENTAL_OPT_VAL,
   VIEW_TYPE_REALTIME_OPT_VAL -> 
QUERY_TYPE_SNAPSHOT_OPT_VAL)
-if (optParams.contains(VIEW_TYPE_OPT_KEY) && 
!optParams.contains(QUERY_TYPE_OPT_KEY)) {
-  log.warn(VIEW_TYPE_OPT_KEY + " is deprecated and will be removed in a 
later release. Please use " + QUERY_TYPE_OPT_KEY)
-  optParams ++ Map(QUERY_TYPE_OPT_KEY -> 
translation(optParams(VIEW_TYPE_OPT_KEY)))
+if (!optParams.contains(QUERY_TYPE_OPT_KEY)) {
+  if (optParams.contains(VIEW_TYPE_OPT_KEY)) {
+log.warn(VIEW_TYPE_OPT_KEY + " is deprecated and will be removed in a 
later release. Please use " + QUERY_TYPE_OPT_KEY)
+optParams ++ Map(QUERY_TYPE_OPT_KEY -> 
translation(optParams(VIEW_TYPE_OPT_KEY)))
+  } else {
+optParams ++ Map(QUERY_TYPE_OPT_KEY -> DEFAULT_QUERY_TYPE_OPT_VAL)
+  }
 } else {
   optParams
 }
diff --git a/hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala 
b/hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala
index 1cf9bdb..4a78378 100644
--- a/hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala
+++ b/hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala
@@ -55,7 +55,7 @@ class DefaultSource extends RelationProvider
   optParams: Map[String, String],
   schema: StructType): BaseRelation = {
 // Add default options for unspecified read options keys.
-val parameters = Map(QUERY_TYPE_OPT_KEY -> DEFAULT_QUERY_TYPE_OPT_VAL) ++ 
translateViewTypesToQueryTypes(optParams)
+val parameters = translateViewTypesToQueryTypes(optParams)
 
 val path = parameters.get("path")
 val readPathsStr = parameters.get(DataSourceReadOptions.READ_PATHS_OPT_KEY)



[GitHub] [hudi] garyli1019 commented on pull request #2243: HUDI-1392 lose partition info when using spark parameter basePath

2020-11-24 Thread GitBox


garyli1019 commented on pull request #2243:
URL: https://github.com/apache/hudi/pull/2243#issuecomment-733445977


   @yui2010 merging. Please assign the Jira ticket to yourself and close it. If 
you don't have contributor access yet, please send an email with your Jira ID 
to the dev mailing list and someone will add you to the project. Thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] bithw1 edited a comment on issue #2276: [SUPPORT] java.lang.IllegalStateException: No Compaction request available

2020-11-24 Thread GitBox


bithw1 edited a comment on issue #2276:
URL: https://github.com/apache/hudi/issues/2276#issuecomment-733441100


   The code that create/upsert the table is as follows, I have explicitly 
specified the following two lines to disable compaction.
   
   ```
 .option(HoodieCompactionConfig.INLINE_COMPACT_PROP, "false")
   .option(DataSourceWriteOptions.ASYNC_COMPACT_ENABLE_OPT_KEY, "false")
   ```
   
   Not sure how I could be able to exercise the compaction feature with 
code..Could you please help? @bvaradar ,Thanks!
   
   
   
   
   
   
   ```
   package org.example.hudi
   
   import org.apache.hudi.DataSourceWriteOptions
   import org.apache.hudi.config.{HoodieCompactionConfig, HoodieIndexConfig, 
HoodieWriteConfig}
   import org.apache.hudi.index.HoodieIndex
   import org.apache.spark.sql.{SaveMode, SparkSession}
   
   case class MyOrder(
   name: String,
   price: String,
   creation_date: String,
   dt: String)
   
   object MORWorkTest {
   
 val overwrite1Data = Seq(
   MyOrder("A", "1", "2020-11-18 14:43:32", "2020-11-19"),
   MyOrder("B", "1", "2020-11-18 14:42:21", "2020-11-19"),
   MyOrder("C", "1", "2020-11-18 14:47:19", "2020-11-19"),
   MyOrder("D", "1", "2020-11-18 14:46:50", "2020-11-19")
 )
   
 val insertUpdate1Data = Seq(
   MyOrder("A", "2", "2020-11-18 14:50:32", "2020-11-19"),
   MyOrder("B", "2", "2020-11-18 14:50:21", "2020-11-19"),
   MyOrder("C", "2", "2020-11-18 14:50:19", "2020-11-19"),
   MyOrder("D", "2", "2020-11-18 14:50:50", "2020-11-19")
 )
   
 val insertUpdate2Data = Seq(
   MyOrder("A", "3", "2020-11-18 14:53:32", "2020-11-19"),
   MyOrder("B", "3", "2020-11-18 14:52:21", "2020-11-19"),
   MyOrder("C", "3", "2020-11-18 14:57:19", "2020-11-19"),
   MyOrder("D", "3", "2020-11-18 14:56:50", "2020-11-19")
 )
   
 val spark = SparkSession.builder.appName("MORTest")
   .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
   .config("spark.sql.warehouse.dir", "hdfs:///user/hive/warehouse")
   .enableHiveSupport().getOrCreate()
   
 val hudi_table = "hudi_hive_read_write_mor_5"
   
 val base_path = s"/data/hudi_demo/$hudi_table"
   
 def run(op: Int) = {
   val (data, saveMode) = op match {
 case 1 => (overwrite1Data, SaveMode.Overwrite)
 case 2 => (insertUpdate1Data, SaveMode.Append)
 case 3 => (insertUpdate2Data, SaveMode.Append)
   }
   import spark.implicits._
   val insertData = spark.createDataset(data)
   insertData.write.format("hudi")
 .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "name")
 .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, 
"creation_date")
 .option(DataSourceWriteOptions.HIVE_DATABASE_OPT_KEY, "xyz")
 .option(DataSourceWriteOptions.HIVE_TABLE_OPT_KEY, hudi_table)
 .option(DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY, "true")
 .option(HoodieIndexConfig.BLOOM_INDEX_UPDATE_PARTITION_PATH, "true")
 .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "dt")
   
 //table type: MOR
 .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY, 
DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL)
   
 //disable async compact
 .option(DataSourceWriteOptions.ASYNC_COMPACT_ENABLE_OPT_KEY, "false")
 .option(HoodieCompactionConfig.INLINE_COMPACT_NUM_DELTA_COMMITS_PROP, 
100)
 //disable inline compact
 .option(HoodieCompactionConfig.INLINE_COMPACT_PROP, "false")
   
   
 .option(DataSourceWriteOptions.HIVE_URL_OPT_KEY, 
"jdbc:hive2://10.41.90.208:1")
 .option(HoodieWriteConfig.TABLE_NAME, hudi_table)
 .option(DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY, 
"org.apache.hudi.hive.MultiPartKeysValueExtractor")
 .option(HoodieIndexConfig.INDEX_TYPE_PROP, 
HoodieIndex.IndexType.GLOBAL_BLOOM.name())
 .option(DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY, "dt")
 .option("hoodie.insert.shuffle.parallelism", "2")
 .option("hoodie.upsert.shuffle.parallelism", "2")
 .mode(saveMode)
 .save(base_path);
 }
   
   
 def main(args: Array[String]): Unit = {
   //do overwrite
   run(1)
   
   //do upsert
   run(2)
   
   //do upsert
   run(3)
   
   println("===MOR is done=")
 }
   
   }
   ```
   
   
   
   
   
   
   
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] bithw1 commented on issue #2276: [SUPPORT] java.lang.IllegalStateException: No Compaction request available

2020-11-24 Thread GitBox


bithw1 commented on issue #2276:
URL: https://github.com/apache/hudi/issues/2276#issuecomment-733441100


   The code that create/upsert the table is as follows, I have explicitly 
specified the following two lines to disable compaction.
   
 .option(HoodieCompactionConfig.INLINE_COMPACT_PROP, "false")
   .option(DataSourceWriteOptions.ASYNC_COMPACT_ENABLE_OPT_KEY, "false")
   
   Not sure how I could be able to exercise the compaction feature with 
code..Could you please help? @bvaradar ,Thanks!
   
   
   
   
   
   
   ```
   package org.example.hudi
   
   import org.apache.hudi.DataSourceWriteOptions
   import org.apache.hudi.config.{HoodieCompactionConfig, HoodieIndexConfig, 
HoodieWriteConfig}
   import org.apache.hudi.index.HoodieIndex
   import org.apache.spark.sql.{SaveMode, SparkSession}
   
   case class MyOrder(
   name: String,
   price: String,
   creation_date: String,
   dt: String)
   
   object MORWorkTest {
   
 val overwrite1Data = Seq(
   MyOrder("A", "1", "2020-11-18 14:43:32", "2020-11-19"),
   MyOrder("B", "1", "2020-11-18 14:42:21", "2020-11-19"),
   MyOrder("C", "1", "2020-11-18 14:47:19", "2020-11-19"),
   MyOrder("D", "1", "2020-11-18 14:46:50", "2020-11-19")
 )
   
 val insertUpdate1Data = Seq(
   MyOrder("A", "2", "2020-11-18 14:50:32", "2020-11-19"),
   MyOrder("B", "2", "2020-11-18 14:50:21", "2020-11-19"),
   MyOrder("C", "2", "2020-11-18 14:50:19", "2020-11-19"),
   MyOrder("D", "2", "2020-11-18 14:50:50", "2020-11-19")
 )
   
 val insertUpdate2Data = Seq(
   MyOrder("A", "3", "2020-11-18 14:53:32", "2020-11-19"),
   MyOrder("B", "3", "2020-11-18 14:52:21", "2020-11-19"),
   MyOrder("C", "3", "2020-11-18 14:57:19", "2020-11-19"),
   MyOrder("D", "3", "2020-11-18 14:56:50", "2020-11-19")
 )
   
 val spark = SparkSession.builder.appName("MORTest")
   .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
   .config("spark.sql.warehouse.dir", "hdfs:///user/hive/warehouse")
   .enableHiveSupport().getOrCreate()
   
 val hudi_table = "hudi_hive_read_write_mor_5"
   
 val base_path = s"/data/hudi_demo/$hudi_table"
   
 def run(op: Int) = {
   val (data, saveMode) = op match {
 case 1 => (overwrite1Data, SaveMode.Overwrite)
 case 2 => (insertUpdate1Data, SaveMode.Append)
 case 3 => (insertUpdate2Data, SaveMode.Append)
   }
   import spark.implicits._
   val insertData = spark.createDataset(data)
   insertData.write.format("hudi")
 .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "name")
 .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, 
"creation_date")
 .option(DataSourceWriteOptions.HIVE_DATABASE_OPT_KEY, "xyz")
 .option(DataSourceWriteOptions.HIVE_TABLE_OPT_KEY, hudi_table)
 .option(DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY, "true")
 .option(HoodieIndexConfig.BLOOM_INDEX_UPDATE_PARTITION_PATH, "true")
 .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "dt")
   
 //table type: MOR
 .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY, 
DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL)
   
 //disable async compact
 .option(DataSourceWriteOptions.ASYNC_COMPACT_ENABLE_OPT_KEY, "false")
 .option(HoodieCompactionConfig.INLINE_COMPACT_NUM_DELTA_COMMITS_PROP, 
100)
 //disable inline compact
 .option(HoodieCompactionConfig.INLINE_COMPACT_PROP, "false")
   
   
 .option(DataSourceWriteOptions.HIVE_URL_OPT_KEY, 
"jdbc:hive2://10.41.90.208:1")
 .option(HoodieWriteConfig.TABLE_NAME, hudi_table)
 .option(DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY, 
"org.apache.hudi.hive.MultiPartKeysValueExtractor")
 .option(HoodieIndexConfig.INDEX_TYPE_PROP, 
HoodieIndex.IndexType.GLOBAL_BLOOM.name())
 .option(DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY, "dt")
 .option("hoodie.insert.shuffle.parallelism", "2")
 .option("hoodie.upsert.shuffle.parallelism", "2")
 .mode(saveMode)
 .save(base_path);
 }
   
   
 def main(args: Array[String]): Unit = {
   //do overwrite
   run(1)
   
   //do upsert
   run(2)
   
   //do upsert
   run(3)
   
   println("===MOR is done=")
 }
   
   }
   ```
   
   
   
   
   
   
   
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] bithw1 commented on issue #2276: [SUPPORT] java.lang.IllegalStateException: No Compaction request available

2020-11-24 Thread GitBox


bithw1 commented on issue #2276:
URL: https://github.com/apache/hudi/issues/2276#issuecomment-733439132


   Thanks @bvaradar , The files on hdfs are:
   
   ```
0 2020-11-22 10:00 
/data/hudi_demo/hudi_hive_read_write_mor_5/.hoodie/.aux
0 2020-11-22 10:01 
/data/hudi_demo/hudi_hive_read_write_mor_5/.hoodie/.temp
 1596 2020-11-22 10:00 
/data/hudi_demo/hudi_hive_read_write_mor_5/.hoodie/20201122100045.deltacommit
  979 2020-11-22 10:00 
/data/hudi_demo/hudi_hive_read_write_mor_5/.hoodie/20201122100045.deltacommit.inflight
0 2020-11-22 10:00 
/data/hudi_demo/hudi_hive_read_write_mor_5/.hoodie/20201122100045.deltacommit.requested
 1646 2020-11-22 10:01 
/data/hudi_demo/hudi_hive_read_write_mor_5/.hoodie/20201122100057.deltacommit
 1639 2020-11-22 10:00 
/data/hudi_demo/hudi_hive_read_write_mor_5/.hoodie/20201122100057.deltacommit.inflight
0 2020-11-22 10:00 
/data/hudi_demo/hudi_hive_read_write_mor_5/.hoodie/20201122100057.deltacommit.requested
 1647 2020-11-22 10:01 
/data/hudi_demo/hudi_hive_read_write_mor_5/.hoodie/20201122100101.deltacommit
 1639 2020-11-22 10:01 
/data/hudi_demo/hudi_hive_read_write_mor_5/.hoodie/20201122100101.deltacommit.inflight
0 2020-11-22 10:01 
/data/hudi_demo/hudi_hive_read_write_mor_5/.hoodie/20201122100101.deltacommit.requested
0 2020-11-22 10:00 
/data/hudi_demo/hudi_hive_read_write_mor_5/.hoodie/archived
  339 2020-11-22 10:00 
/data/hudi_demo/hudi_hive_read_write_mor_5/.hoodie/hoodie.properties
   ```
   
   When I run any of the commits time(20201122100045 20201122100057 
20201122100101 ), all complains that: No Compaction request available



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (HUDI-981) Use rocksDB as flink state backend

2020-11-24 Thread wangxianghu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wangxianghu reassigned HUDI-981:


Assignee: chijunqing  (was: wangxianghu)

> Use rocksDB as flink state backend
> --
>
> Key: HUDI-981
> URL: https://issues.apache.org/jira/browse/HUDI-981
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: wangxianghu
>Assignee: chijunqing
>Priority: Major
>
> Use rocksDB as flink state backend 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] SteNicholas commented on pull request #2111: [HUDI-1234] Insert new records regardless of small file when using insert operation

2020-11-24 Thread GitBox


SteNicholas commented on pull request #2111:
URL: https://github.com/apache/hudi/pull/2111#issuecomment-733426822


   > @SteNicholas still interested in driving this forward?
   
   @vinothchandar , yes, I have discussed with @leesf offline. This week would 
be completed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] asharma4-lucid commented on issue #2269: [SUPPORT] - HUDI Table Bulk Insert for 5 gb parquet file progressively taking longer time to insert.

2020-11-24 Thread GitBox


asharma4-lucid commented on issue #2269:
URL: https://github.com/apache/hudi/issues/2269#issuecomment-733323629


   Yes this is a COW table.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] bvaradar commented on issue #2277: [SUPPORT]

2020-11-24 Thread GitBox


bvaradar commented on issue #2277:
URL: https://github.com/apache/hudi/issues/2277#issuecomment-733305873


   @umehrot2 : Can you please take a look at this ?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] bvaradar commented on issue #2276: [SUPPORT] java.lang.IllegalStateException: No Compaction request available

2020-11-24 Thread GitBox


bvaradar commented on issue #2276:
URL: https://github.com/apache/hudi/issues/2276#issuecomment-733304908


   You can use hudi-cli and use "compactions show all" to list compactions and 
find the timestamp of one that is pending. 
   Another option is to list .hoodie folder and find all the files 
.compaction.requested where there is no corresponding .commit file 
present. These are the pending compactions which you can use to run compaction.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] bvaradar commented on issue #2269: [SUPPORT] - HUDI Table Bulk Insert for 5 gb parquet file progressively taking longer time to insert.

2020-11-24 Thread GitBox


bvaradar commented on issue #2269:
URL: https://github.com/apache/hudi/issues/2269#issuecomment-733299492


   @asharma4-lucid : ~5hrs is way too much. Can you disable cleaning using the 
config hoodie.clean.automatic=false and try. Is this a COW table ?
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on pull request #2208: [HUDI-1040] Make Hudi support Spark 3

2020-11-24 Thread GitBox


vinothchandar commented on pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#issuecomment-733204015


   @giaosudau that seems like JVM crash. Not sure what in this PR could crash 
that.
   Do you have more diagnostic info? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] asharma4-lucid commented on issue #2269: [SUPPORT] - HUDI Table Bulk Insert for 5 gb parquet file progressively taking longer time to insert.

2020-11-24 Thread GitBox


asharma4-lucid commented on issue #2269:
URL: https://github.com/apache/hudi/issues/2269#issuecomment-733174238


   Thanks @bvaradar. I tried to insert just 5 records to the existing table 
with ~300K partitions and it took close to ~5 hrs. If I insert ~5 records in a 
new table it takes less than 2 mins. Is this extra time of ~5 hrs all because 
of cleaner and compaction processes? For our use case, we mostly get inserts. 
With that in mind, would it be beneficial for us if we switch to MOR from COW 
and do async compaction (I am most likely making an incorrect assumption that 
this huge extra processing time is only because of compaction) ? And also, 
since our data does not have frequent record level updates, would switching to 
MOR make any difference?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io edited a comment on pull request #2278: [HUDI-1412] Make HoodieWriteConfig support setting different default …

2020-11-24 Thread GitBox


codecov-io edited a comment on pull request #2278:
URL: https://github.com/apache/hudi/pull/2278#issuecomment-733020702


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2278?src=pr=h1) Report
   > Merging 
[#2278](https://codecov.io/gh/apache/hudi/pull/2278?src=pr=desc) (12b85dc) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/0ebef1c0a0e4b96616ee7e4372d3b9f0eb83a919?el=desc)
 (0ebef1c) will **decrease** coverage by `43.14%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2278/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2278?src=pr=tree)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2278   +/-   ##
   =
   - Coverage 53.55%   10.41%   -43.15% 
   + Complexity 2774   48 -2726 
   =
 Files   348   50  -298 
 Lines 16115 1777-14338 
 Branches   1640  211 -1429 
   =
   - Hits   8631  185 -8446 
   + Misses 6785 1579 -5206 
   + Partials699   13  -686 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudicommon | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudispark | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `10.41% <ø> (-59.66%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2278?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2278/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2278/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2278/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2278/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2278/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2278/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | 
[...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2278/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2278/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | |
   | 
[...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2278/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | |
   | 
[...lities/schema/SchemaProviderWithPostProcessor.java](https://codecov.io/gh/apache/hudi/pull/2278/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm92aWRlcldpdGhQb3N0UHJvY2Vzc29yLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> 

[GitHub] [hudi] wangxianghu commented on pull request #2271: [WIP][HUDI-1335] Introduce FlinkHoodieSimpleIndex to hudi-flink-client

2020-11-24 Thread GitBox


wangxianghu commented on pull request #2271:
URL: https://github.com/apache/hudi/pull/2271#issuecomment-733026404


   blocked by https://github.com/apache/hudi/pull/2278



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] wangxianghu removed a comment on pull request #2271: [WIP][HUDI-1335] Introduce FlinkHoodieSimpleIndex to hudi-flink-client

2020-11-24 Thread GitBox


wangxianghu removed a comment on pull request #2271:
URL: https://github.com/apache/hudi/pull/2271#issuecomment-733023377


   blocked by https://github.com/apache/hudi/pull/2278



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io edited a comment on pull request #2278: [HUDI-1412] Make HoodieWriteConfig support setting different default …

2020-11-24 Thread GitBox


codecov-io edited a comment on pull request #2278:
URL: https://github.com/apache/hudi/pull/2278#issuecomment-733020702


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2278?src=pr=h1) Report
   > Merging 
[#2278](https://codecov.io/gh/apache/hudi/pull/2278?src=pr=desc) (40c6d23) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/0ebef1c0a0e4b96616ee7e4372d3b9f0eb83a919?el=desc)
 (0ebef1c) will **decrease** coverage by `43.14%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2278/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2278?src=pr=tree)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2278   +/-   ##
   =
   - Coverage 53.55%   10.41%   -43.15% 
   + Complexity 2774   48 -2726 
   =
 Files   348   50  -298 
 Lines 16115 1777-14338 
 Branches   1640  211 -1429 
   =
   - Hits   8631  185 -8446 
   + Misses 6785 1579 -5206 
   + Partials699   13  -686 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudispark | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `10.41% <ø> (-59.66%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2278?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2278/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2278/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2278/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2278/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2278/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2278/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | 
[...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2278/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2278/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | |
   | 
[...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2278/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | |
   | 
[...lities/schema/SchemaProviderWithPostProcessor.java](https://codecov.io/gh/apache/hudi/pull/2278/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm92aWRlcldpdGhQb3N0UHJvY2Vzc29yLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | ... and 

[GitHub] [hudi] wangxianghu commented on pull request #2271: [WIP][HUDI-1335] Introduce FlinkHoodieSimpleIndex to hudi-flink-client

2020-11-24 Thread GitBox


wangxianghu commented on pull request #2271:
URL: https://github.com/apache/hudi/pull/2271#issuecomment-733023377


   blocked by https://github.com/apache/hudi/pull/2278



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] wangxianghu commented on pull request #2278: [HUDI-1412] Make HoodieWriteConfig support setting different default …

2020-11-24 Thread GitBox


wangxianghu commented on pull request #2278:
URL: https://github.com/apache/hudi/pull/2278#issuecomment-733022202


   @yanghua please take a look when free



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io commented on pull request #2278: [HUDI-1412] Make HoodieWriteConfig support setting different default …

2020-11-24 Thread GitBox


codecov-io commented on pull request #2278:
URL: https://github.com/apache/hudi/pull/2278#issuecomment-733020702


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2278?src=pr=h1) Report
   > Merging 
[#2278](https://codecov.io/gh/apache/hudi/pull/2278?src=pr=desc) (12b85dc) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/0ebef1c0a0e4b96616ee7e4372d3b9f0eb83a919?el=desc)
 (0ebef1c) will **decrease** coverage by `43.14%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2278/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2278?src=pr=tree)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2278   +/-   ##
   =
   - Coverage 53.55%   10.41%   -43.15% 
   + Complexity 2774   48 -2726 
   =
 Files   348   50  -298 
 Lines 16115 1777-14338 
 Branches   1640  211 -1429 
   =
   - Hits   8631  185 -8446 
   + Misses 6785 1579 -5206 
   + Partials699   13  -686 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudispark | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `10.41% <ø> (-59.66%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2278?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2278/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2278/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2278/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2278/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2278/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2278/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | 
[...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2278/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2278/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | |
   | 
[...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2278/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | |
   | 
[...lities/schema/SchemaProviderWithPostProcessor.java](https://codecov.io/gh/apache/hudi/pull/2278/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm92aWRlcldpdGhQb3N0UHJvY2Vzc29yLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | ... and [324 

[jira] [Updated] (HUDI-1412) Make HoodieWriteConfig support setting different default value according to engine type

2020-11-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1412:
-
Labels: pull-request-available  (was: )

> Make HoodieWriteConfig support setting different default value according to 
> engine type
> ---
>
> Key: HUDI-1412
> URL: https://issues.apache.org/jira/browse/HUDI-1412
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: wangxianghu
>Assignee: wangxianghu
>Priority: Major
>  Labels: pull-request-available
>
> Currently, `HoodieIndexConfig` set its default index type to bloom, which is 
> suitable for spark engine.
> But,since hoodie has supported flink engine, we should make the default 
> values setted according to engine user used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] wangxianghu opened a new pull request #2278: [HUDI-1412] Make HoodieWriteConfig support setting different default …

2020-11-24 Thread GitBox


wangxianghu opened a new pull request #2278:
URL: https://github.com/apache/hudi/pull/2278


   …value according to engine type
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *Make HoodieWriteConfig support setting different default value according to 
engine type*
   
   ## Brief change log
   
   Currently, `HoodieIndexConfig` set its default index type to bloom, which is 
suitable for spark engine.
   
   But,since hoodie has supported flink engine, we should make the default 
values setted according to engine user used.
   
   ## Verify this pull request
   
   This pull request is already covered by existing tests, such as 
*org.apache.hudi.config.TestHoodieWriteConfig#testDefaultIndexAccordingToEngineType*.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1412) Make HoodieWriteConfig support setting different default value according to engine type

2020-11-24 Thread wangxianghu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wangxianghu updated HUDI-1412:
--
Description: 
Currently, `HoodieIndexConfig` set its default index type to bloom, which is 
suitable for spark engine.

But,since hoodie has supported flink engine, we should make the default values 
setted according to engine user used.

> Make HoodieWriteConfig support setting different default value according to 
> engine type
> ---
>
> Key: HUDI-1412
> URL: https://issues.apache.org/jira/browse/HUDI-1412
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: wangxianghu
>Assignee: wangxianghu
>Priority: Major
>
> Currently, `HoodieIndexConfig` set its default index type to bloom, which is 
> suitable for spark engine.
> But,since hoodie has supported flink engine, we should make the default 
> values setted according to engine user used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1412) Make HoodieWriteConfig support setting different default value according to engine type

2020-11-24 Thread wangxianghu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wangxianghu updated HUDI-1412:
--
Summary: Make HoodieWriteConfig support setting different default value 
according to engine type  (was: Make HoodieConfig support setting different 
default value according to engine type)

> Make HoodieWriteConfig support setting different default value according to 
> engine type
> ---
>
> Key: HUDI-1412
> URL: https://issues.apache.org/jira/browse/HUDI-1412
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: wangxianghu
>Assignee: wangxianghu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] codecov-io edited a comment on pull request #2216: [HUDI-1357] Added a check to ensure no records are lost during updates.

2020-11-24 Thread GitBox


codecov-io edited a comment on pull request #2216:
URL: https://github.com/apache/hudi/pull/2216#issuecomment-729776111


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2216?src=pr=h1) Report
   > Merging 
[#2216](https://codecov.io/gh/apache/hudi/pull/2216?src=pr=desc) (c8f05c9) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/6310a2307abba94c7ff8a770f45462deae2c312e?el=desc)
 (6310a23) will **decrease** coverage by `43.26%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2216/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2216?src=pr=tree)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2216   +/-   ##
   =
   - Coverage 53.67%   10.41%   -43.27% 
   + Complexity 2849   48 -2801 
   =
 Files   359   50  -309 
 Lines 16565 1777-14788 
 Branches   1782  211 -1571 
   =
   - Hits   8892  185 -8707 
   + Misses 6916 1579 -5337 
   + Partials757   13  -744 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudispark | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `10.41% <ø> (-59.69%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2216?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2216/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2216/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2216/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2216/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2216/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2216/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | 
[...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2216/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2216/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | |
   | 
[...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2216/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | |
   | 
[...lities/schema/SchemaProviderWithPostProcessor.java](https://codecov.io/gh/apache/hudi/pull/2216/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm92aWRlcldpdGhQb3N0UHJvY2Vzc29yLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | ... and 

[jira] [Commented] (HUDI-1414) HoodieInputFormat support for bucketed partitions

2020-11-24 Thread linshan-ma (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238081#comment-17238081
 ] 

linshan-ma commented on HUDI-1414:
--

I'm interested in this ticket。 I want to try it.

> HoodieInputFormat support for bucketed partitions
> -
>
> Key: HUDI-1414
> URL: https://issues.apache.org/jira/browse/HUDI-1414
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Presto Integration
>Reporter: Satish Kotha
>Priority: Major
> Fix For: 0.8.0
>
>
> When querying a hoodie partition through presto, we get following error:
> {code}
> Presto error: {u'errorCode': 13, u'message': u'Presto cannot read bucketed 
> partition in an input format with UseFileSplitsFromInputFormat annotation: 
> HoodieInputFormat', u'errorType': u'USER_ERROR', u'failureInfo': 
> {u'suppressed': [], u'message': u'Presto cannot read bucketed partition in an 
> input format with UseFileSplitsFromInputFormat annotation: 
> HoodieInputFormat', u'type': u'com.facebook.presto.spi.PrestoException', 
> u'stack': 
> [u'com.facebook.presto.hive.BackgroundHiveSplitLoader.lambda$loadPartition$5(BackgroundHiveSplitLoader.java:432)',
>  
> u'com.facebook.presto.hive.authentication.UserGroupInformationUtils.lambda$executeActionInDoAs$0(UserGroupInformationUtils.java:29)',
>  u'java.base/java.security.AccessController.doPrivileged(Native Method)', 
> u'java.base/javax.security.auth.Subject.doAs(Subject.java:361)', 
> u'org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1816)',
>  
> u'com.facebook.presto.hive.authentication.UserGroupInformationUtils.executeActionInDoAs(UserGroupInformationUtils.java:27)',
>  
> u'com.facebook.presto.hive.authentication.ImpersonatingHdfsAuthentication.doAs(ImpersonatingHdfsAuthentication.java:39)',
>  
> u'com.facebook.presto.hive.BackgroundHiveSplitLoader.loadPartition(BackgroundHiveSplitLoader.java:430)',
>  
> u'com.facebook.presto.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:330)',
>  
> u'com.facebook.presto.hive.BackgroundHiveSplitLoader.access$300(BackgroundHiveSplitLoader.java:116)',
>  
> u'com.facebook.presto.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:259)',
>  
> u'com.facebook.presto.hive.util.ResumableTasks.safeProcessTask(ResumableTasks.java:47)',
>  
> u'com.facebook.presto.hive.util.ResumableTasks.access$000(ResumableTasks.java:20)',
>  
> u'com.facebook.presto.hive.util.ResumableTasks$1.run(ResumableTasks.java:35)',
>  
> u'com.facebook.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:78)',
>  
> u'java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)',
>  
> u'java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)',
>  u'java.base/java.lang.Thread.run(Thread.java:834)']}, u'errorName': 
> u'NOT_SUPPORTED'}
> {code}
> Figure out how to add support for bucketed partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-1414) HoodieInputFormat support for bucketed partitions

2020-11-24 Thread linshan-ma (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

linshan-ma reassigned HUDI-1414:


Assignee: linshan-ma

> HoodieInputFormat support for bucketed partitions
> -
>
> Key: HUDI-1414
> URL: https://issues.apache.org/jira/browse/HUDI-1414
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Presto Integration
>Reporter: Satish Kotha
>Assignee: linshan-ma
>Priority: Major
> Fix For: 0.8.0
>
>
> When querying a hoodie partition through presto, we get following error:
> {code}
> Presto error: {u'errorCode': 13, u'message': u'Presto cannot read bucketed 
> partition in an input format with UseFileSplitsFromInputFormat annotation: 
> HoodieInputFormat', u'errorType': u'USER_ERROR', u'failureInfo': 
> {u'suppressed': [], u'message': u'Presto cannot read bucketed partition in an 
> input format with UseFileSplitsFromInputFormat annotation: 
> HoodieInputFormat', u'type': u'com.facebook.presto.spi.PrestoException', 
> u'stack': 
> [u'com.facebook.presto.hive.BackgroundHiveSplitLoader.lambda$loadPartition$5(BackgroundHiveSplitLoader.java:432)',
>  
> u'com.facebook.presto.hive.authentication.UserGroupInformationUtils.lambda$executeActionInDoAs$0(UserGroupInformationUtils.java:29)',
>  u'java.base/java.security.AccessController.doPrivileged(Native Method)', 
> u'java.base/javax.security.auth.Subject.doAs(Subject.java:361)', 
> u'org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1816)',
>  
> u'com.facebook.presto.hive.authentication.UserGroupInformationUtils.executeActionInDoAs(UserGroupInformationUtils.java:27)',
>  
> u'com.facebook.presto.hive.authentication.ImpersonatingHdfsAuthentication.doAs(ImpersonatingHdfsAuthentication.java:39)',
>  
> u'com.facebook.presto.hive.BackgroundHiveSplitLoader.loadPartition(BackgroundHiveSplitLoader.java:430)',
>  
> u'com.facebook.presto.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:330)',
>  
> u'com.facebook.presto.hive.BackgroundHiveSplitLoader.access$300(BackgroundHiveSplitLoader.java:116)',
>  
> u'com.facebook.presto.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:259)',
>  
> u'com.facebook.presto.hive.util.ResumableTasks.safeProcessTask(ResumableTasks.java:47)',
>  
> u'com.facebook.presto.hive.util.ResumableTasks.access$000(ResumableTasks.java:20)',
>  
> u'com.facebook.presto.hive.util.ResumableTasks$1.run(ResumableTasks.java:35)',
>  
> u'com.facebook.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:78)',
>  
> u'java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)',
>  
> u'java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)',
>  u'java.base/java.lang.Thread.run(Thread.java:834)']}, u'errorName': 
> u'NOT_SUPPORTED'}
> {code}
> Figure out how to add support for bucketed partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] liujinhui1994 commented on a change in pull request #2242: [HUDI-1366] Make deltasteamer support exporting data from hdfs to hudi

2020-11-24 Thread GitBox


liujinhui1994 commented on a change in pull request #2242:
URL: https://github.com/apache/hudi/pull/2242#discussion_r52946



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java
##
@@ -522,14 +523,18 @@ public static void main(String[] args) throws Exception {
  */
 private transient DeltaSync deltaSync;
 
+private final HoodieDeltaStreamerConfig deltaStreamerConfig;
+
 public DeltaSyncService(Config cfg, JavaSparkContext jssc, FileSystem fs, 
Configuration conf,
 Option properties) throws 
IOException {
+  this.props = properties.get();
   this.cfg = cfg;
   this.jssc = jssc;
   this.sparkSession = 
SparkSession.builder().config(jssc.getConf()).getOrCreate();
   this.asyncCompactService = Option.empty();
+  this.deltaStreamerConfig = new HoodieDeltaStreamerConfig(props);
 
-  if (fs.exists(new Path(cfg.targetBasePath))) {
+  if (fs.exists(new Path(cfg.targetBasePath)) && 
!deltaStreamerConfig.getFullOverwrite()) {

Review comment:
   The parameter itself only acts on DFSSouce,so is the command-line tool 
appropriate?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] liujinhui1994 commented on a change in pull request #2242: [HUDI-1366] Make deltasteamer support exporting data from hdfs to hudi

2020-11-24 Thread GitBox


liujinhui1994 commented on a change in pull request #2242:
URL: https://github.com/apache/hudi/pull/2242#discussion_r52946



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java
##
@@ -522,14 +523,18 @@ public static void main(String[] args) throws Exception {
  */
 private transient DeltaSync deltaSync;
 
+private final HoodieDeltaStreamerConfig deltaStreamerConfig;
+
 public DeltaSyncService(Config cfg, JavaSparkContext jssc, FileSystem fs, 
Configuration conf,
 Option properties) throws 
IOException {
+  this.props = properties.get();
   this.cfg = cfg;
   this.jssc = jssc;
   this.sparkSession = 
SparkSession.builder().config(jssc.getConf()).getOrCreate();
   this.asyncCompactService = Option.empty();
+  this.deltaStreamerConfig = new HoodieDeltaStreamerConfig(props);
 
-  if (fs.exists(new Path(cfg.targetBasePath))) {
+  if (fs.exists(new Path(cfg.targetBasePath)) && 
!deltaStreamerConfig.getFullOverwrite()) {

Review comment:
   The parameter itself only acts on DFSSouce





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] santas-little-helper-13 opened a new issue #2277: [SUPPORT]

2020-11-24 Thread GitBox


santas-little-helper-13 opened a new issue #2277:
URL: https://github.com/apache/hudi/issues/2277


   Hi,
   
   I am working with hudi in AWS Glue. I have a problem with hudi updates.
   
   So I have one Glue job that inserts data into hudi parquet files, it reads 
data from glue table, does some processing, gets max ID_key from already 
existing data and adds it to the row number in order for Id_key to be unique on 
the whole table level.
   Now I have the other Glue job in which I read from that hudi table:
   
   `hudiDF = spark.read.format("hudi").load('s3://prct-parquet-tgt/test_task1' 
+ "/*")`
   
   limit it to just one record and make changes in one column and in column 
upd_ind which is precombine field (all records have 0 by default as upd_ind):
   
   `updateDF = hudiDF.limit(1).withColumn('sequence', 
lit('new_value')).withColumn('upd_ind', lit(1))`
   
   then I define hudi options:
   
   ```
   hoodie_write_options = {
'hoodie.datasource.write.table.type': 'COPY_ON_WRITE',
'hoodie.parquet.compression.codec': 'snappy',
'hoodie.table.name': 'test_task1',
'hoodie.datasource.write.recordkey.field': 'ID_key',
'hoodie.datasource.write.hive_style_partitioning': True,
'hoodie.datasource.write.table.name': 'test_task1',
'hoodie.datasource.write.operation': 'upsert',
'hoodie.datasource.write.precombine.field': 'upd_ind', 
'hoodie.datasource.write.insert.drop.duplicates': True,
'hoodie.datasource.write.partitionpath.field': "datehour",
'hoodie.upsert.shuffle.parallelism': 8,
'hoodie.insert.shuffle.parallelism': 8,
'hoodie.datasource.write.keygenerator.class': 
'org.apache.hudi.keygen.ComplexKeyGenerator',
'hoodie.parquet.small.file.limit': 0
   }
   ```
   
   and write the updated row:
   
   
`updateDF.write.format('hudi').options(**hoodie_write_options).mode('append').save('s3://prct-parquet-tgt/test_task1')`
   
   The problem is that the record that gets updated is random and has no 
connection to the record that is shown in Glue job.
   If I define specific record, then update isn’t done at all:
   
   `updateDF = hudiDF.filter(col('ID_key')==64777).withColumn('sequence', 
lit('new_value')).withColumn('upd_ind', lit(1))`
   
   I need to update the exact record that I specify. Please help.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org