[ https://issues.apache.org/jira/browse/HUDI-388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
lamber-ken updated HUDI-388: ---------------------------- Description: *Purpose* Currently, hudi offers some tools available to operate an ecosystem of Hudi datasets including hudi-cli, metrics, spark ui[1]. It's easy for admins to manage the hudi datasets by some customized ddl sql statements instead of via hudi-cli. After SPARK-18127, we can customize the spark session with our own optimizer, parser, analyzer, and physical plan strategy rules in Spark. Here are some steps to extend spark session 1, Need a tool to parse the SparkSQL statements, like antlr, RegExp. 2, A class which extends org.apache.spark.sql.SparkSessionExtensions and inject the parser. 3, Run the customized statements by extending org.apache.spark.sql.execution.command.RunnableCommand. *Demo* 1, Extend SparkSessionExtensions {code:java} class HudiSparkSessionExtension extends (SparkSessionExtensions => Unit) { override def apply(extensions: SparkSessionExtensions): Unit = { extensions.injectParser { (session, parser) => new HudiDDLParser(parser) } } } {code} 2, Extend RunnableCommand {code:java} case class HudiStatCommand(path: String) extends RunnableCommand { override val output: Seq[Attribute] = { Seq( AttributeReference("CommitTime", StringType, nullable = false)(), AttributeReference("Total Upserted", IntegerType, nullable = false)(), AttributeReference("Total Written", IntegerType, nullable = false)(), AttributeReference("Write Amplifiation Factor", DoubleType, nullable = false)() ) } override def run(sparkSession: SparkSession): Seq[Row] = { Seq( Row("20191207003131", 0, 10, 0.1), Row("20191207003200", 4, 10, 2.50), Row("Total", 4, 20, 5.00) ) } } {code} [https://github.com/lamber-ken/hudi-work] [http://hudi.apache.org/admin_guide.html] https://issues.apache.org/jira/browse/SPARK-18127 was: *Purpose* Currently, hudi offers some tools available to operate an ecosystem of Hudi datasets including hudi-cli, metrics, spark ui[1]. It's easy for admins to manage the hudi datasets by some customized ddl sql statements instead of via hudi-cli. After SPARK-18127, we can customize the spark session with our own optimizer, parser, analyzer, and physical plan strategy rules in Spark. Here are some steps to extend spark session 1, Need a tool to parse the SparkSQL statements, like antlr, RegExp. 2, A class which extends org.apache.spark.sql.SparkSessionExtensions and inject the parser. 3, Run the customized statements by extending org.apache.spark.sql.execution.command.RunnableCommand. *Demo* 1, Extend SparkSessionExtensions {code:java} class HudiSparkSessionExtension extends (SparkSessionExtensions => Unit) { override def apply(extensions: SparkSessionExtensions): Unit = { extensions.injectParser { (session, parser) => new HudiDDLParser(parser) } } } {code} 2, Extend RunnableCommand {code:java} case class HudiStatCommand(path: String) extends RunnableCommand { override val output: Seq[Attribute] = { Seq( AttributeReference("CommitTime", IntegerType, nullable = false)(), AttributeReference("Total Upserted", IntegerType, nullable = false)(), AttributeReference("Total Written", IntegerType, nullable = false)(), AttributeReference("Write Amplifiation Factor", IntegerType, nullable = false)() ) } override def run(sparkSession: SparkSession): Seq[Row] = { Seq( Row("20191207003131", 0, 10, 0), Row("20191207003200", 4, 10, 2.50), Row("Total", 4, 20, 5.00) ) } } {code} [https://github.com/lamber-ken/hudi-work] [http://hudi.apache.org/admin_guide.html] https://issues.apache.org/jira/browse/SPARK-18127 > Support DDL / DML SparkSQL statements which useful for admins > ------------------------------------------------------------- > > Key: HUDI-388 > URL: https://issues.apache.org/jira/browse/HUDI-388 > Project: Apache Hudi (incubating) > Issue Type: New Feature > Reporter: lamber-ken > Assignee: lamber-ken > Priority: Major > > *Purpose* > Currently, hudi offers some tools available to operate an ecosystem of Hudi > datasets including hudi-cli, metrics, spark ui[1]. It's easy for admins to > manage the hudi datasets by some customized ddl sql statements instead of via > hudi-cli. > > After SPARK-18127, we can customize the spark session with our own optimizer, > parser, analyzer, and physical plan strategy rules in Spark. Here are some > steps to extend spark session > 1, Need a tool to parse the SparkSQL statements, like antlr, RegExp. > 2, A class which extends org.apache.spark.sql.SparkSessionExtensions and > inject the parser. > 3, Run the customized statements by extending > org.apache.spark.sql.execution.command.RunnableCommand. > > *Demo* > 1, Extend SparkSessionExtensions > {code:java} > class HudiSparkSessionExtension extends (SparkSessionExtensions => Unit) { > override def apply(extensions: SparkSessionExtensions): Unit = { > extensions.injectParser { (session, parser) => > new HudiDDLParser(parser) > } > } > } {code} > > 2, Extend RunnableCommand > {code:java} > case class HudiStatCommand(path: String) extends RunnableCommand { > override val output: Seq[Attribute] = { > Seq( > AttributeReference("CommitTime", StringType, nullable = false)(), > AttributeReference("Total Upserted", IntegerType, nullable = false)(), > AttributeReference("Total Written", IntegerType, nullable = false)(), > AttributeReference("Write Amplifiation Factor", DoubleType, nullable = > false)() > ) > } > override def run(sparkSession: SparkSession): Seq[Row] = { > Seq( > Row("20191207003131", 0, 10, 0.1), > Row("20191207003200", 4, 10, 2.50), > Row("Total", 4, 20, 5.00) > ) > } > } > {code} > > [https://github.com/lamber-ken/hudi-work] > [http://hudi.apache.org/admin_guide.html] > https://issues.apache.org/jira/browse/SPARK-18127 > -- This message was sent by Atlassian Jira (v8.3.4#803005)