[jira] [Commented] (SPARK-40708) Auto update table statistics based on write metrics

2023-05-23 Thread GridGain Integration (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725491#comment-17725491
 ] 

GridGain Integration commented on SPARK-40708:
--

User 'jackylee-ch' has created a pull request for this issue:
https://github.com/apache/spark/pull/40944

> Auto update table statistics based on write metrics
> ---
>
> Key: SPARK-40708
> URL: https://issues.apache.org/jira/browse/SPARK-40708
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
>
> {code:scala}
>   // Get write statistics
>   def getWriteStats(mode: SaveMode, metrics: Map[String, SQLMetric]): 
> Option[WriteStats] = {
> val numBytes = 
> metrics.get(NUM_OUTPUT_BYTES_KEY).map(_.value).map(BigInt(_))
> val numRows = metrics.get(NUM_OUTPUT_ROWS_KEY).map(_.value).map(BigInt(_))
> numBytes.map(WriteStats(mode, _, numRows))
>   }
> // Update table statistics
>   val stat = wroteStats.get
>   stat.mode match {
> case SaveMode.Overwrite | SaveMode.ErrorIfExists =>
>   catalog.alterTableStats(table.identifier,
> Some(CatalogStatistics(stat.numBytes, stat.numRows)))
> case _ if table.stats.nonEmpty => // SaveMode.Append
>   catalog.alterTableStats(table.identifier, None)
> case _ => // SaveMode.Ignore Do nothing
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40708) Auto update table statistics based on write metrics

2022-12-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17649017#comment-17649017
 ] 

Apache Spark commented on SPARK-40708:
--

User 'jackylee-ch' has created a pull request for this issue:
https://github.com/apache/spark/pull/39114

> Auto update table statistics based on write metrics
> ---
>
> Key: SPARK-40708
> URL: https://issues.apache.org/jira/browse/SPARK-40708
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
>
> {code:scala}
>   // Get write statistics
>   def getWriteStats(mode: SaveMode, metrics: Map[String, SQLMetric]): 
> Option[WriteStats] = {
> val numBytes = 
> metrics.get(NUM_OUTPUT_BYTES_KEY).map(_.value).map(BigInt(_))
> val numRows = metrics.get(NUM_OUTPUT_ROWS_KEY).map(_.value).map(BigInt(_))
> numBytes.map(WriteStats(mode, _, numRows))
>   }
> // Update table statistics
>   val stat = wroteStats.get
>   stat.mode match {
> case SaveMode.Overwrite | SaveMode.ErrorIfExists =>
>   catalog.alterTableStats(table.identifier,
> Some(CatalogStatistics(stat.numBytes, stat.numRows)))
> case _ if table.stats.nonEmpty => // SaveMode.Append
>   catalog.alterTableStats(table.identifier, None)
> case _ => // SaveMode.Ignore Do nothing
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40708) Auto update table statistics based on write metrics

2022-11-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17628234#comment-17628234
 ] 

Apache Spark commented on SPARK-40708:
--

User 'wankunde' has created a pull request for this issue:
https://github.com/apache/spark/pull/38496

> Auto update table statistics based on write metrics
> ---
>
> Key: SPARK-40708
> URL: https://issues.apache.org/jira/browse/SPARK-40708
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
>
> {code:scala}
>   // Get write statistics
>   def getWriteStats(mode: SaveMode, metrics: Map[String, SQLMetric]): 
> Option[WriteStats] = {
> val numBytes = 
> metrics.get(NUM_OUTPUT_BYTES_KEY).map(_.value).map(BigInt(_))
> val numRows = metrics.get(NUM_OUTPUT_ROWS_KEY).map(_.value).map(BigInt(_))
> numBytes.map(WriteStats(mode, _, numRows))
>   }
> // Update table statistics
>   val stat = wroteStats.get
>   stat.mode match {
> case SaveMode.Overwrite | SaveMode.ErrorIfExists =>
>   catalog.alterTableStats(table.identifier,
> Some(CatalogStatistics(stat.numBytes, stat.numRows)))
> case _ if table.stats.nonEmpty => // SaveMode.Append
>   catalog.alterTableStats(table.identifier, None)
> case _ => // SaveMode.Ignore Do nothing
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40708) Auto update table statistics based on write metrics

2022-11-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17628235#comment-17628235
 ] 

Apache Spark commented on SPARK-40708:
--

User 'wankunde' has created a pull request for this issue:
https://github.com/apache/spark/pull/38496

> Auto update table statistics based on write metrics
> ---
>
> Key: SPARK-40708
> URL: https://issues.apache.org/jira/browse/SPARK-40708
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
>
> {code:scala}
>   // Get write statistics
>   def getWriteStats(mode: SaveMode, metrics: Map[String, SQLMetric]): 
> Option[WriteStats] = {
> val numBytes = 
> metrics.get(NUM_OUTPUT_BYTES_KEY).map(_.value).map(BigInt(_))
> val numRows = metrics.get(NUM_OUTPUT_ROWS_KEY).map(_.value).map(BigInt(_))
> numBytes.map(WriteStats(mode, _, numRows))
>   }
> // Update table statistics
>   val stat = wroteStats.get
>   stat.mode match {
> case SaveMode.Overwrite | SaveMode.ErrorIfExists =>
>   catalog.alterTableStats(table.identifier,
> Some(CatalogStatistics(stat.numBytes, stat.numRows)))
> case _ if table.stats.nonEmpty => // SaveMode.Append
>   catalog.alterTableStats(table.identifier, None)
> case _ => // SaveMode.Ignore Do nothing
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40708) Auto update table statistics based on write metrics

2022-10-24 Thread Jackey Lee (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17623543#comment-17623543
 ] 

Jackey Lee commented on SPARK-40708:


Maybe we can use `CommitProtocol`, which can return more metrics information, 
to feedback Partition Metrics to the Driver and use 
`CommandUtils#updateTableStats` to update the Table/Partition Metrics.

> Auto update table statistics based on write metrics
> ---
>
> Key: SPARK-40708
> URL: https://issues.apache.org/jira/browse/SPARK-40708
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
>
> {code:scala}
>   // Get write statistics
>   def getWriteStats(mode: SaveMode, metrics: Map[String, SQLMetric]): 
> Option[WriteStats] = {
> val numBytes = 
> metrics.get(NUM_OUTPUT_BYTES_KEY).map(_.value).map(BigInt(_))
> val numRows = metrics.get(NUM_OUTPUT_ROWS_KEY).map(_.value).map(BigInt(_))
> numBytes.map(WriteStats(mode, _, numRows))
>   }
> // Update table statistics
>   val stat = wroteStats.get
>   stat.mode match {
> case SaveMode.Overwrite | SaveMode.ErrorIfExists =>
>   catalog.alterTableStats(table.identifier,
> Some(CatalogStatistics(stat.numBytes, stat.numRows)))
> case _ if table.stats.nonEmpty => // SaveMode.Append
>   catalog.alterTableStats(table.identifier, None)
> case _ => // SaveMode.Ignore Do nothing
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40708) Auto update table statistics based on write metrics

2022-10-24 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17623533#comment-17623533
 ] 

Yang Jie commented on SPARK-40708:
--

After SPARK-38573,  CommandUtils#updateTableStats method begin to support 
updating stats at the partition level. 

For this ticket, maybe we should add more partition level metric, so as to 
completely replace the CommandUtils#updateTableStats method.
 
 

> Auto update table statistics based on write metrics
> ---
>
> Key: SPARK-40708
> URL: https://issues.apache.org/jira/browse/SPARK-40708
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
>
> {code:scala}
>   // Get write statistics
>   def getWriteStats(mode: SaveMode, metrics: Map[String, SQLMetric]): 
> Option[WriteStats] = {
> val numBytes = 
> metrics.get(NUM_OUTPUT_BYTES_KEY).map(_.value).map(BigInt(_))
> val numRows = metrics.get(NUM_OUTPUT_ROWS_KEY).map(_.value).map(BigInt(_))
> numBytes.map(WriteStats(mode, _, numRows))
>   }
> // Update table statistics
>   val stat = wroteStats.get
>   stat.mode match {
> case SaveMode.Overwrite | SaveMode.ErrorIfExists =>
>   catalog.alterTableStats(table.identifier,
> Some(CatalogStatistics(stat.numBytes, stat.numRows)))
> case _ if table.stats.nonEmpty => // SaveMode.Append
>   catalog.alterTableStats(table.identifier, None)
> case _ => // SaveMode.Ignore Do nothing
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org