[jira] [Updated] (SPARK-30525) HiveTableScanExec do not need to prune partitions again after pushing down to hive metastore
[ https://issues.apache.org/jira/browse/SPARK-30525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hu Fuwang updated SPARK-30525: -- Description: In HiveTableScanExec, it will push down to hive metastore for partition pruning if _spark.sql.hive.metastorePartitionPruning_ is true, and then it will prune the returned partitions again using partition filters, because some predicates, eg. "b like 'xyz'", are not supported in hive metastore. But now this problem is already fixed in HiveExternalCatalog.listPartitionsByFilter, the HiveExternalCatalog.listPartitionsByFilter can return exactly what we want now. So it is not necessary any more to double prune in HiveTableScanExec. (was: In HiveTableScanExec, it will push down to hive metastore for partition pruning if spark.sql.hive.metastorePartitionPruning is true, and then it will prune the returned partitions again using partition filters, because some predicates, eg. "b like 'xyz'", are not supported in hive metastore. But now this problem is already fixed in HiveExternalCatalog.listPartitionsByFilter, the HiveExternalCatalog.listPartitionsByFilter can return exactly what we want now. So it is not necessary any more to double prune in HiveTableScanExec.) > HiveTableScanExec do not need to prune partitions again after pushing down to > hive metastore > > > Key: SPARK-30525 > URL: https://issues.apache.org/jira/browse/SPARK-30525 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Hu Fuwang >Priority: Major > > In HiveTableScanExec, it will push down to hive metastore for partition > pruning if _spark.sql.hive.metastorePartitionPruning_ is true, and then it > will prune the returned partitions again using partition filters, because > some predicates, eg. "b like 'xyz'", are not supported in hive metastore. But > now this problem is already fixed in > HiveExternalCatalog.listPartitionsByFilter, the > HiveExternalCatalog.listPartitionsByFilter can return exactly what we want > now. So it is not necessary any more to double prune in HiveTableScanExec. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30525) HiveTableScanExec do not need to prune partitions again after pushing down to hive metastore
[ https://issues.apache.org/jira/browse/SPARK-30525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hu Fuwang updated SPARK-30525: -- Description: In HiveTableScanExec, it will push down to hive metastore for partition pruning if spark.sql.hive.metastorePartitionPruning is true, and then it will prune the returned partitions again using partition filters, because some predicates, eg. "b like 'xyz'", are not supported in hive metastore. But now this problem is already fixed in HiveExternalCatalog.listPartitionsByFilter, the HiveExternalCatalog.listPartitionsByFilter can return exactly what we want now. So it is not necessary any more to double prune in HiveTableScanExec. (was: In HiveTableScanExec, it will push down to hive metastore for partition pruning if spark.sql.hive.metastorePartitionPruning is true, and then it will prune the returned partitions again using partition filters, because some predicates, eg. "b like 'xyz'", are not supported in hive metastore. But now this problem is already fixed in HiveExternalCatalog.listPartitionsByFilter, the HiveExternalCatalog.listPartitionsByFilter can return exactly what we want now. So it is not necessary any more to double prune in HiveTableScanExec.) > HiveTableScanExec do not need to prune partitions again after pushing down to > hive metastore > > > Key: SPARK-30525 > URL: https://issues.apache.org/jira/browse/SPARK-30525 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Hu Fuwang >Priority: Major > > In HiveTableScanExec, it will push down to hive metastore for partition > pruning if spark.sql.hive.metastorePartitionPruning is true, and then it will > prune the returned partitions again using partition filters, because some > predicates, eg. "b like 'xyz'", are not supported in hive metastore. But now > this problem is already fixed in HiveExternalCatalog.listPartitionsByFilter, > the HiveExternalCatalog.listPartitionsByFilter can return exactly what we > want now. So it is not necessary any more to double prune in > HiveTableScanExec. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30525) HiveTableScanExec do not need to prune partitions again after pushing down to hive metastore
Hu Fuwang created SPARK-30525: - Summary: HiveTableScanExec do not need to prune partitions again after pushing down to hive metastore Key: SPARK-30525 URL: https://issues.apache.org/jira/browse/SPARK-30525 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.0 Reporter: Hu Fuwang In HiveTableScanExec, it will push down to hive metastore for partition pruning if spark.sql.hive.metastorePartitionPruning is true, and then it will prune the returned partitions again using partition filters, because some predicates, eg. "b like 'xyz'", are not supported in hive metastore. But now this problem is already fixed in HiveExternalCatalog.listPartitionsByFilter, the HiveExternalCatalog.listPartitionsByFilter can return exactly what we want now. So it is not necessary any more to double prune in HiveTableScanExec. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30516) statistic estimation of FileScan should take partitionFilters and partition number into account
[ https://issues.apache.org/jira/browse/SPARK-30516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hu Fuwang updated SPARK-30516: -- Summary: statistic estimation of FileScan should take partitionFilters and partition number into account (was: FileScan.estimateStatistics does not take partitionFilters and partition number into account) > statistic estimation of FileScan should take partitionFilters and partition > number into account > --- > > Key: SPARK-30516 > URL: https://issues.apache.org/jira/browse/SPARK-30516 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Hu Fuwang >Priority: Major > > Currently, FileScan.estimateStatistics does not take partitionFilters and > partition number into account, which may lead to bigger sizeInBytes. It > should be reasonable to change it to involve partitionFilters and partition > number when estimating the statistics. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30516) FileScan.estimateStatistics does not take partitionFilters and partition number into account
[ https://issues.apache.org/jira/browse/SPARK-30516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hu Fuwang updated SPARK-30516: -- Description: Currently, FileScan.estimateStatistics does not take partitionFilters and partition number into account, which may lead to bigger sizeInBytes. It should be reasonable to change it to involve partitionFilters and partition number when estimating the statistics. (was: Currently, FileScan.estimateStatistics will not take partitionFilters into account, which may lead to bigger sizeInBytes. It should be reasonable to change it to involve partitionFilters and partition numbers when estimating the statistics.) > FileScan.estimateStatistics does not take partitionFilters and partition > number into account > > > Key: SPARK-30516 > URL: https://issues.apache.org/jira/browse/SPARK-30516 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Hu Fuwang >Priority: Major > > Currently, FileScan.estimateStatistics does not take partitionFilters and > partition number into account, which may lead to bigger sizeInBytes. It > should be reasonable to change it to involve partitionFilters and partition > number when estimating the statistics. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30516) FileScan.estimateStatistics does not take partitionFilters and partition number into account
[ https://issues.apache.org/jira/browse/SPARK-30516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hu Fuwang updated SPARK-30516: -- Description: Currently, FileScan.estimateStatistics will not take partitionFilters into account, which may lead to bigger sizeInBytes. It should be reasonable to change it to involve partitionFilters and partition numbers when estimating the statistics. (was: Currently, FileScan.estimateStatistics will not take partitionFilters into account, which may lead to bigger sizeInBytes. It should be reasonable to change it to involve partitionFilters and partition numbers when estimating the statistics.) > FileScan.estimateStatistics does not take partitionFilters and partition > number into account > > > Key: SPARK-30516 > URL: https://issues.apache.org/jira/browse/SPARK-30516 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Hu Fuwang >Priority: Major > > Currently, FileScan.estimateStatistics will not take partitionFilters into > account, which may lead to bigger sizeInBytes. It should be reasonable to > change it to involve partitionFilters and partition numbers when estimating > the statistics. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30516) FileScan.estimateStatistics does not take partitionFilters and partition number into account
Hu Fuwang created SPARK-30516: - Summary: FileScan.estimateStatistics does not take partitionFilters and partition number into account Key: SPARK-30516 URL: https://issues.apache.org/jira/browse/SPARK-30516 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.0 Reporter: Hu Fuwang Currently, FileScan.estimateStatistics will not take partitionFilters into account, which may lead to bigger sizeInBytes. It should be reasonable to change it to involve partitionFilters and partition numbers when estimating the statistics. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30469) Partition columns should not be involved when calculating sizeInBytes of Project logical plan
[ https://issues.apache.org/jira/browse/SPARK-30469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hu Fuwang updated SPARK-30469: -- Description: When getting the statistics of a Project logical plan, if CBO not enabled, Spark will call SizeInBytesOnlyStatsPlanVisitor.visitUnaryNode to calculate the size in bytes, which will compute the ratio of the row size of the project plan and its child plan. And the row size is computed based on the output attributes (columns). Currently, SizeInBytesOnlyStatsPlanVisitor.visitUnaryNode involve partition columns of hive table as well, which is not reasonable, because partition columns actually does not account for sizeInBytes. This may make the sizeInBytes not accurate. was: When getting the statistics of a Project logical plan, if CBO not enabled, Spark will call SizeInBytesOnlyStatsPlanVisitor.visitUnaryNode to calculate the size in bytes, which will compute the ratio of the row size of the project plan and its child plan. And the row size is computed based on the output attributes (columns). Currently, SizeInBytesOnlyStatsPlanVisitor.visitUnaryNode involve partition columns of hive table as well, which is not reasonable, because hive partition column actually does not account for sizeInBytes. This may make the sizeInBytes not accurate. > Partition columns should not be involved when calculating sizeInBytes of > Project logical plan > - > > Key: SPARK-30469 > URL: https://issues.apache.org/jira/browse/SPARK-30469 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Hu Fuwang >Priority: Major > > When getting the statistics of a Project logical plan, if CBO not enabled, > Spark will call SizeInBytesOnlyStatsPlanVisitor.visitUnaryNode to calculate > the size in bytes, which will compute the ratio of the row size of the > project plan and its child plan. > And the row size is computed based on the output attributes (columns). > Currently, SizeInBytesOnlyStatsPlanVisitor.visitUnaryNode involve partition > columns of hive table as well, which is not reasonable, because partition > columns actually does not account for sizeInBytes. > This may make the sizeInBytes not accurate. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30469) Hive Partition columns should not be involved when calculating sizeInBytes of Project logical plan
[ https://issues.apache.org/jira/browse/SPARK-30469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hu Fuwang updated SPARK-30469: -- Description: When getting the statistics of a Project logical plan, if CBO not enabled, Spark will call SizeInBytesOnlyStatsPlanVisitor.visitUnaryNode to calculate the size in bytes, which will compute the ratio of the row size of the project plan and its child plan. And the row size is computed based on the output attributes (columns). Currently, SizeInBytesOnlyStatsPlanVisitor.visitUnaryNode involve partition columns of hive table as well, which is not reasonable, because hive partition column actually does not account for sizeInBytes. This may make the sizeInBytes not accurate. was: When getting the statistics of a Project logical plan, if CBO not enabled, Spark will call SizeInBytesOnlyStatsPlanVisitor.visitUnaryNode to calculate the size in bytes, which will compute the ratio of the row size of the project plan and its child plan. And the row size is computed based on the out attributes (columns). Currently, SizeInBytesOnlyStatsPlanVisitor.visitUnaryNode involve partition columns of hive table as well, which is not reasonable, because hive partition column actually does not account for sizeInBytes. This may make the sizeInBytes not accurate. > Hive Partition columns should not be involved when calculating sizeInBytes of > Project logical plan > -- > > Key: SPARK-30469 > URL: https://issues.apache.org/jira/browse/SPARK-30469 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Hu Fuwang >Priority: Major > > When getting the statistics of a Project logical plan, if CBO not enabled, > Spark will call SizeInBytesOnlyStatsPlanVisitor.visitUnaryNode to calculate > the size in bytes, which will compute the ratio of the row size of the > project plan and its child plan. > And the row size is computed based on the output attributes (columns). > Currently, SizeInBytesOnlyStatsPlanVisitor.visitUnaryNode involve partition > columns of hive table as well, which is not reasonable, because hive > partition column actually does not account for sizeInBytes. > This may make the sizeInBytes not accurate. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30469) Partition columns should not be involved when calculating sizeInBytes of Project logical plan
[ https://issues.apache.org/jira/browse/SPARK-30469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hu Fuwang updated SPARK-30469: -- Summary: Partition columns should not be involved when calculating sizeInBytes of Project logical plan (was: Hive Partition columns should not be involved when calculating sizeInBytes of Project logical plan) > Partition columns should not be involved when calculating sizeInBytes of > Project logical plan > - > > Key: SPARK-30469 > URL: https://issues.apache.org/jira/browse/SPARK-30469 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Hu Fuwang >Priority: Major > > When getting the statistics of a Project logical plan, if CBO not enabled, > Spark will call SizeInBytesOnlyStatsPlanVisitor.visitUnaryNode to calculate > the size in bytes, which will compute the ratio of the row size of the > project plan and its child plan. > And the row size is computed based on the output attributes (columns). > Currently, SizeInBytesOnlyStatsPlanVisitor.visitUnaryNode involve partition > columns of hive table as well, which is not reasonable, because hive > partition column actually does not account for sizeInBytes. > This may make the sizeInBytes not accurate. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30469) Hive Partition columns should not be involved when calculating sizeInBytes of Project logical plan
Hu Fuwang created SPARK-30469: - Summary: Hive Partition columns should not be involved when calculating sizeInBytes of Project logical plan Key: SPARK-30469 URL: https://issues.apache.org/jira/browse/SPARK-30469 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Hu Fuwang When getting the statistics of a Project logical plan, if CBO not enabled, Spark will call SizeInBytesOnlyStatsPlanVisitor.visitUnaryNode to calculate the size in bytes, which will compute the ratio of the row size of the project plan and its child plan. And the row size is computed based on the out attributes (columns). Currently, SizeInBytesOnlyStatsPlanVisitor.visitUnaryNode involve partition columns of hive table as well, which is not reasonable, because hive partition column actually does not account for sizeInBytes. This may make the sizeInBytes not accurate. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30427) Add config item for limiting partition number when calculating statistics through File System
[ https://issues.apache.org/jira/browse/SPARK-30427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hu Fuwang updated SPARK-30427: -- Description: Currently, when spark need to calculate the statistics (eg. sizeInBytes) of table partition through file system (eg. HDFS), it does not consider the number of partitions. Then if the the number of partitions is huge, it will cost much time to calculate the statistics which may be not be that useful. It should be reasonable to add a config item to control the limit of partition number allowable to calculate statistics through file system. > Add config item for limiting partition number when calculating statistics > through File System > - > > Key: SPARK-30427 > URL: https://issues.apache.org/jira/browse/SPARK-30427 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Hu Fuwang >Priority: Major > > Currently, when spark need to calculate the statistics (eg. sizeInBytes) of > table partition through file system (eg. HDFS), it does not consider the > number of partitions. Then if the the number of partitions is huge, it will > cost much time to calculate the statistics which may be not be that useful. > It should be reasonable to add a config item to control the limit of > partition number allowable to calculate statistics through file system. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30427) Add config item for limiting partition number when calculating statistics through File System
[ https://issues.apache.org/jira/browse/SPARK-30427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hu Fuwang updated SPARK-30427: -- Summary: Add config item for limiting partition number when calculating statistics through File System (was: Add config item for limiting partition number when calculating statistics through HDFS) > Add config item for limiting partition number when calculating statistics > through File System > - > > Key: SPARK-30427 > URL: https://issues.apache.org/jira/browse/SPARK-30427 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Hu Fuwang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30427) Add config item for limiting partition number when calculating statistics through HDFS
Hu Fuwang created SPARK-30427: - Summary: Add config item for limiting partition number when calculating statistics through HDFS Key: SPARK-30427 URL: https://issues.apache.org/jira/browse/SPARK-30427 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Hu Fuwang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-30215) Remove PrunedInMemoryFileIndex and merge its functionality into InMemoryFileIndex
[ https://issues.apache.org/jira/browse/SPARK-30215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hu Fuwang resolved SPARK-30215. --- Resolution: Not A Problem > Remove PrunedInMemoryFileIndex and merge its functionality into > InMemoryFileIndex > - > > Key: SPARK-30215 > URL: https://issues.apache.org/jira/browse/SPARK-30215 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Hu Fuwang >Priority: Major > > PrunedInMemoryFileIndex is only used in CatalogFileIndex.filterPartitions, > and its name is kind of confusing, we can completely merge its functionality > into InMemoryFileIndex and remove the class. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30259) CREATE TABLE throw error when session catalog specified
[ https://issues.apache.org/jira/browse/SPARK-30259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hu Fuwang updated SPARK-30259: -- Description: Spark throw error when the session catalog is specified explicitly in "CREATE TABLE" and "CREATE TABLE AS SELECT" command, eg. {code:java} CREATE TABLE spark_catalog.tbl USING json AS SELECT 1 AS i; {code} the error message is like below: {noformat} 19/12/14 10:56:08 INFO HiveMetaStore: 0: get_table : db=spark_catalog tbl=tbl 19/12/14 10:56:08 INFO audit: ugi=fuwhu ip=unknown-ip-addr cmd=get_table : db=spark_catalog tbl=tbl 19/12/14 10:56:08 INFO HiveMetaStore: 0: get_database: spark_catalog 19/12/14 10:56:08 INFO audit: ugi=fuwhu ip=unknown-ip-addr cmd=get_database: spark_catalog 19/12/14 10:56:08 WARN ObjectStore: Failed to get database spark_catalog, returning NoSuchObjectException Error in query: Database 'spark_catalog' not found;{noformat} was: Spark throw error when the session catalog is specified explicitly in "CREATE TABLE" and "CREATE TABLE AS SELECT" command, eg. {code:java} // code placeholder CREATE TABLE spark_catalog.tbl USING json AS SELECT 1 AS i; {code} the error message is like below: {noformat} 19/12/14 10:56:08 INFO HiveMetaStore: 0: get_table : db=spark_catalog tbl=tbl 19/12/14 10:56:08 INFO audit: ugi=fuwhu ip=unknown-ip-addr cmd=get_table : db=spark_catalog tbl=tbl 19/12/14 10:56:08 INFO HiveMetaStore: 0: get_database: spark_catalog 19/12/14 10:56:08 INFO audit: ugi=fuwhu ip=unknown-ip-addr cmd=get_database: spark_catalog 19/12/14 10:56:08 WARN ObjectStore: Failed to get database spark_catalog, returning NoSuchObjectException Error in query: Database 'spark_catalog' not found;{noformat} > CREATE TABLE throw error when session catalog specified > --- > > Key: SPARK-30259 > URL: https://issues.apache.org/jira/browse/SPARK-30259 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Hu Fuwang >Priority: Major > > Spark throw error when the session catalog is specified explicitly in "CREATE > TABLE" and "CREATE TABLE AS SELECT" command, eg. > {code:java} > CREATE TABLE spark_catalog.tbl USING json AS SELECT 1 AS i; > {code} > the error message is like below: > {noformat} > 19/12/14 10:56:08 INFO HiveMetaStore: 0: get_table : db=spark_catalog tbl=tbl > 19/12/14 10:56:08 INFO audit: ugi=fuwhu ip=unknown-ip-addr cmd=get_table > : db=spark_catalog tbl=tbl > 19/12/14 10:56:08 INFO HiveMetaStore: 0: get_database: spark_catalog > 19/12/14 10:56:08 INFO audit: ugi=fuwhu ip=unknown-ip-addr > cmd=get_database: spark_catalog > 19/12/14 10:56:08 WARN ObjectStore: Failed to get database spark_catalog, > returning NoSuchObjectException > Error in query: Database 'spark_catalog' not found;{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30259) CREATE TABLE throw error when session catalog specified
[ https://issues.apache.org/jira/browse/SPARK-30259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hu Fuwang updated SPARK-30259: -- Description: Spark throw error when the session catalog is specified explicitly in "CREATE TABLE" and "CREATE TABLE AS SELECT" command, eg. {code:java} // code placeholder CREATE TABLE spark_catalog.tbl USING json AS SELECT 1 AS i; {code} the error message is like below: {noformat} 19/12/14 10:56:08 INFO HiveMetaStore: 0: get_table : db=spark_catalog tbl=tbl 19/12/14 10:56:08 INFO audit: ugi=fuwhu ip=unknown-ip-addr cmd=get_table : db=spark_catalog tbl=tbl 19/12/14 10:56:08 INFO HiveMetaStore: 0: get_database: spark_catalog 19/12/14 10:56:08 INFO audit: ugi=fuwhu ip=unknown-ip-addr cmd=get_database: spark_catalog 19/12/14 10:56:08 WARN ObjectStore: Failed to get database spark_catalog, returning NoSuchObjectException Error in query: Database 'spark_catalog' not found;{noformat} was: Spark throw error when the session catalog is specified explicitly in the CREATE TABLE AS SELECT command, eg. {code:java} // code placeholder CREATE TABLE spark_catalog.tbl USING json AS SELECT 1 AS i; {code} the error message is like below: {noformat} 19/12/14 10:56:08 INFO HiveMetaStore: 0: get_table : db=spark_catalog tbl=tbl 19/12/14 10:56:08 INFO audit: ugi=fuwhu ip=unknown-ip-addr cmd=get_table : db=spark_catalog tbl=tbl 19/12/14 10:56:08 INFO HiveMetaStore: 0: get_database: spark_catalog 19/12/14 10:56:08 INFO audit: ugi=fuwhu ip=unknown-ip-addr cmd=get_database: spark_catalog 19/12/14 10:56:08 WARN ObjectStore: Failed to get database spark_catalog, returning NoSuchObjectException Error in query: Database 'spark_catalog' not found;{noformat} > CREATE TABLE throw error when session catalog specified > --- > > Key: SPARK-30259 > URL: https://issues.apache.org/jira/browse/SPARK-30259 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Hu Fuwang >Priority: Major > > Spark throw error when the session catalog is specified explicitly in "CREATE > TABLE" and "CREATE TABLE AS SELECT" command, eg. > {code:java} > // code placeholder > CREATE TABLE spark_catalog.tbl USING json AS SELECT 1 AS i; > {code} > the error message is like below: > {noformat} > 19/12/14 10:56:08 INFO HiveMetaStore: 0: get_table : db=spark_catalog tbl=tbl > 19/12/14 10:56:08 INFO audit: ugi=fuwhu ip=unknown-ip-addr cmd=get_table > : db=spark_catalog tbl=tbl > 19/12/14 10:56:08 INFO HiveMetaStore: 0: get_database: spark_catalog > 19/12/14 10:56:08 INFO audit: ugi=fuwhu ip=unknown-ip-addr > cmd=get_database: spark_catalog > 19/12/14 10:56:08 WARN ObjectStore: Failed to get database spark_catalog, > returning NoSuchObjectException > Error in query: Database 'spark_catalog' not found;{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30259) CREATE TABLE throw error when session catalog specified
[ https://issues.apache.org/jira/browse/SPARK-30259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hu Fuwang updated SPARK-30259: -- Summary: CREATE TABLE throw error when session catalog specified (was: CREATE TABLE AS SELECT throw error when session catalog specified) > CREATE TABLE throw error when session catalog specified > --- > > Key: SPARK-30259 > URL: https://issues.apache.org/jira/browse/SPARK-30259 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Hu Fuwang >Priority: Major > > Spark throw error when the session catalog is specified explicitly in the > CREATE TABLE AS SELECT command, eg. > {code:java} > // code placeholder > CREATE TABLE spark_catalog.tbl USING json AS SELECT 1 AS i; > {code} > the error message is like below: > {noformat} > 19/12/14 10:56:08 INFO HiveMetaStore: 0: get_table : db=spark_catalog tbl=tbl > 19/12/14 10:56:08 INFO audit: ugi=fuwhu ip=unknown-ip-addr cmd=get_table > : db=spark_catalog tbl=tbl > 19/12/14 10:56:08 INFO HiveMetaStore: 0: get_database: spark_catalog > 19/12/14 10:56:08 INFO audit: ugi=fuwhu ip=unknown-ip-addr > cmd=get_database: spark_catalog > 19/12/14 10:56:08 WARN ObjectStore: Failed to get database spark_catalog, > returning NoSuchObjectException > Error in query: Database 'spark_catalog' not found;{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30259) CREATE TABLE AS SELECT throw error when session catalog specified
[ https://issues.apache.org/jira/browse/SPARK-30259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hu Fuwang updated SPARK-30259: -- Description: Spark throw error when the session catalog is specified explicitly in the CREATE TABLE AS SELECT command, eg. {code:java} // code placeholder CREATE TABLE spark_catalog.tbl USING json AS SELECT 1 AS i; {code} the error message is like below: {noformat} 19/12/14 10:56:08 INFO HiveMetaStore: 0: get_table : db=spark_catalog tbl=tbl 19/12/14 10:56:08 INFO audit: ugi=fuwhu ip=unknown-ip-addr cmd=get_table : db=spark_catalog tbl=tbl 19/12/14 10:56:08 INFO HiveMetaStore: 0: get_database: spark_catalog 19/12/14 10:56:08 INFO audit: ugi=fuwhu ip=unknown-ip-addr cmd=get_database: spark_catalog 19/12/14 10:56:08 WARN ObjectStore: Failed to get database spark_catalog, returning NoSuchObjectException Error in query: Database 'spark_catalog' not found;{noformat} > CREATE TABLE AS SELECT throw error when session catalog specified > - > > Key: SPARK-30259 > URL: https://issues.apache.org/jira/browse/SPARK-30259 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Hu Fuwang >Priority: Major > > Spark throw error when the session catalog is specified explicitly in the > CREATE TABLE AS SELECT command, eg. > {code:java} > // code placeholder > CREATE TABLE spark_catalog.tbl USING json AS SELECT 1 AS i; > {code} > the error message is like below: > {noformat} > 19/12/14 10:56:08 INFO HiveMetaStore: 0: get_table : db=spark_catalog tbl=tbl > 19/12/14 10:56:08 INFO audit: ugi=fuwhu ip=unknown-ip-addr cmd=get_table > : db=spark_catalog tbl=tbl > 19/12/14 10:56:08 INFO HiveMetaStore: 0: get_database: spark_catalog > 19/12/14 10:56:08 INFO audit: ugi=fuwhu ip=unknown-ip-addr > cmd=get_database: spark_catalog > 19/12/14 10:56:08 WARN ObjectStore: Failed to get database spark_catalog, > returning NoSuchObjectException > Error in query: Database 'spark_catalog' not found;{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30259) CREATE TABLE AS SELECT throw error when session catalog specified
Hu Fuwang created SPARK-30259: - Summary: CREATE TABLE AS SELECT throw error when session catalog specified Key: SPARK-30259 URL: https://issues.apache.org/jira/browse/SPARK-30259 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Hu Fuwang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30215) Remove PrunedInMemoryFileIndex and merge its functionality into InMemoryFileIndex
Hu Fuwang created SPARK-30215: - Summary: Remove PrunedInMemoryFileIndex and merge its functionality into InMemoryFileIndex Key: SPARK-30215 URL: https://issues.apache.org/jira/browse/SPARK-30215 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Hu Fuwang PrunedInMemoryFileIndex is only used in CatalogFileIndex.filterPartitions, and its name is kind of confusing, we can completely merge its functionality into InMemoryFileIndex and remove the class. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30138) Separate configuration key of max iterations for analyzer and optimizer
[ https://issues.apache.org/jira/browse/SPARK-30138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hu Fuwang updated SPARK-30138: -- Description: Currently, both Analyzer and Optimizer use conf "spark.sql.optimizer.excludedRules" to set the max iterations to run, which is a little confusing. It is clearer to add a new conf "spark.sql.analyzer.excludedRules" for analyzer max iterations. > Separate configuration key of max iterations for analyzer and optimizer > --- > > Key: SPARK-30138 > URL: https://issues.apache.org/jira/browse/SPARK-30138 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Hu Fuwang >Priority: Major > > Currently, both Analyzer and Optimizer use conf > "spark.sql.optimizer.excludedRules" to set the max iterations to run, which > is a little confusing. > It is clearer to add a new conf "spark.sql.analyzer.excludedRules" for > analyzer max iterations. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30138) Separate configuration key of max iterations for analyzer and optimizer
Hu Fuwang created SPARK-30138: - Summary: Separate configuration key of max iterations for analyzer and optimizer Key: SPARK-30138 URL: https://issues.apache.org/jira/browse/SPARK-30138 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Hu Fuwang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-29707) Make PartitionFilters and PushedFilters abbreviate configurable in metadata
[ https://issues.apache.org/jira/browse/SPARK-29707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hu Fuwang updated SPARK-29707: -- Comment: was deleted (was: I am working on this.) > Make PartitionFilters and PushedFilters abbreviate configurable in metadata > --- > > Key: SPARK-29707 > URL: https://issues.apache.org/jira/browse/SPARK-29707 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Minor > Attachments: screenshot-1.png > > > !screenshot-1.png! > It lost some key information. > Related code: > https://github.com/apache/spark/blob/ec5d698d99634e5bb8fc7b0fa1c270dd67c129c8/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L58-L66 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29979) Add basic/reserved property key constants in Table and SupportsNamespaces
[ https://issues.apache.org/jira/browse/SPARK-29979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hu Fuwang updated SPARK-29979: -- Summary: Add basic/reserved property key constants in Table and SupportsNamespaces (was: Add property key constants in Table and SupportsNamespaces) > Add basic/reserved property key constants in Table and SupportsNamespaces > - > > Key: SPARK-29979 > URL: https://issues.apache.org/jira/browse/SPARK-29979 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Hu Fuwang >Priority: Major > > Currently, some basic/reserved keys (eg. "location", "comment") of table and > namespace properties are hard coded or defined in specific logical plan > implementation class. These keys can be centralized in Table and > SupportsNamespaces interface and shared across different implementation > classes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29979) Add property key constants in Table and SupportsNamespaces
[ https://issues.apache.org/jira/browse/SPARK-29979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hu Fuwang updated SPARK-29979: -- Description: Currently, some basic/reserved keys (eg. "location", "comment") of table and namespace properties are hard coded or defined in specific logical plan implementation class. These keys can be centralized in Table and SupportsNamespaces interface and shared across different implementation classes. (was: Currently, for ) > Add property key constants in Table and SupportsNamespaces > -- > > Key: SPARK-29979 > URL: https://issues.apache.org/jira/browse/SPARK-29979 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Hu Fuwang >Priority: Major > > Currently, some basic/reserved keys (eg. "location", "comment") of table and > namespace properties are hard coded or defined in specific logical plan > implementation class. These keys can be centralized in Table and > SupportsNamespaces interface and shared across different implementation > classes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29979) Add property constants in Table and SupportsNamespaces
Hu Fuwang created SPARK-29979: - Summary: Add property constants in Table and SupportsNamespaces Key: SPARK-29979 URL: https://issues.apache.org/jira/browse/SPARK-29979 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Hu Fuwang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29979) Add property constants in Table and SupportsNamespaces
[ https://issues.apache.org/jira/browse/SPARK-29979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hu Fuwang updated SPARK-29979: -- Description: Currently, for > Add property constants in Table and SupportsNamespaces > -- > > Key: SPARK-29979 > URL: https://issues.apache.org/jira/browse/SPARK-29979 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Hu Fuwang >Priority: Major > > Currently, for -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29979) Add property key constants in Table and SupportsNamespaces
[ https://issues.apache.org/jira/browse/SPARK-29979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hu Fuwang updated SPARK-29979: -- Summary: Add property key constants in Table and SupportsNamespaces (was: Add property constants in Table and SupportsNamespaces) > Add property key constants in Table and SupportsNamespaces > -- > > Key: SPARK-29979 > URL: https://issues.apache.org/jira/browse/SPARK-29979 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Hu Fuwang >Priority: Major > > Currently, for -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29859) ALTER DATABASE (SET LOCATION) should look up catalog like v2 commands
[ https://issues.apache.org/jira/browse/SPARK-29859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972532#comment-16972532 ] Hu Fuwang commented on SPARK-29859: --- Working on this. > ALTER DATABASE (SET LOCATION) should look up catalog like v2 commands > - > > Key: SPARK-29859 > URL: https://issues.apache.org/jira/browse/SPARK-29859 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Hu Fuwang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29858) ALTER DATABASE (SET DBPROPERTIES) should look up catalog like v2 commands
[ https://issues.apache.org/jira/browse/SPARK-29858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972530#comment-16972530 ] Hu Fuwang commented on SPARK-29858: --- Working on this. > ALTER DATABASE (SET DBPROPERTIES) should look up catalog like v2 commands > - > > Key: SPARK-29858 > URL: https://issues.apache.org/jira/browse/SPARK-29858 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Hu Fuwang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29859) ALTER DATABASE (SET LOCATION) should look up catalog like v2 commands
Hu Fuwang created SPARK-29859: - Summary: ALTER DATABASE (SET LOCATION) should look up catalog like v2 commands Key: SPARK-29859 URL: https://issues.apache.org/jira/browse/SPARK-29859 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Hu Fuwang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29858) ALTER DATABASE (SET DBPROPERTIES) should look up catalog like v2 commands
Hu Fuwang created SPARK-29858: - Summary: ALTER DATABASE (SET DBPROPERTIES) should look up catalog like v2 commands Key: SPARK-29858 URL: https://issues.apache.org/jira/browse/SPARK-29858 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Hu Fuwang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29834) DESC DATABASE should look up catalog like v2 commands
Hu Fuwang created SPARK-29834: - Summary: DESC DATABASE should look up catalog like v2 commands Key: SPARK-29834 URL: https://issues.apache.org/jira/browse/SPARK-29834 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Hu Fuwang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29834) DESC DATABASE should look up catalog like v2 commands
[ https://issues.apache.org/jira/browse/SPARK-29834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16971345#comment-16971345 ] Hu Fuwang commented on SPARK-29834: --- Working on this. > DESC DATABASE should look up catalog like v2 commands > - > > Key: SPARK-29834 > URL: https://issues.apache.org/jira/browse/SPARK-29834 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Hu Fuwang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29707) Make PartitionFilters and PushedFilters abbreviate configurable in metadata
[ https://issues.apache.org/jira/browse/SPARK-29707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964655#comment-16964655 ] Hu Fuwang commented on SPARK-29707: --- I am working on this. > Make PartitionFilters and PushedFilters abbreviate configurable in metadata > --- > > Key: SPARK-29707 > URL: https://issues.apache.org/jira/browse/SPARK-29707 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > Attachments: screenshot-1.png > > > !screenshot-1.png! > It lost some key information. > Related code: > https://github.com/apache/spark/blob/ec5d698d99634e5bb8fc7b0fa1c270dd67c129c8/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L58-L66 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29615) Add insertInto method with byName parameter in DataFrameWriter
[ https://issues.apache.org/jira/browse/SPARK-29615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hu Fuwang resolved SPARK-29615. --- Resolution: Not A Problem > Add insertInto method with byName parameter in DataFrameWriter > -- > > Key: SPARK-29615 > URL: https://issues.apache.org/jira/browse/SPARK-29615 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Hu Fuwang >Priority: Major > > Currently, the insertion through DataFrameWriter.insertInto method ignores > the column names and just uses position-based resolution. As DataFrameWriter > only has one public insertInto method, spark users may not check the > description of this method and assume Spark will match the columns by name. > In such case, wrong column may be used as partition column, which may result > in problem (eg. huge amount of files/folders may be created in hive table tmp > location). > We propose to add a new insertInto method in DataFrameWriter which has byName > parameter for Spark user to specify whether match columns by name. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29615) Add insertInto method with byName parameter in DataFrameWriter
Hu Fuwang created SPARK-29615: - Summary: Add insertInto method with byName parameter in DataFrameWriter Key: SPARK-29615 URL: https://issues.apache.org/jira/browse/SPARK-29615 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Hu Fuwang Currently, the insertion through DataFrameWriter.insertInto method ignores the column names and just uses position-based resolution. As DataFrameWriter only has one public insertInto method, spark users may not check the description of this method and assume Spark will match the columns by name. In such case, wrong column may be used as partition column, which may result in problem (eg. huge amount of files/folders may be created in hive table tmp location). We propose to add a new insertInto method in DataFrameWriter which has byName parameter for Spark user to specify whether match columns by name. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28512) New optional mode: throw runtime exceptions on casting failures
[ https://issues.apache.org/jira/browse/SPARK-28512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960370#comment-16960370 ] Hu Fuwang commented on SPARK-28512: --- [~Gengliang.Wang] Is this already solved by [https://github.com/apache/spark/pull/25997] ? > New optional mode: throw runtime exceptions on casting failures > --- > > Key: SPARK-28512 > URL: https://issues.apache.org/jira/browse/SPARK-28512 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gengliang Wang >Priority: Major > > In popular DBMS like MySQL/PostgreSQL/Oracle, runtime exceptions are thrown > on casting, e.g. cast('abc' as boolean) > While in Spark, the result is converted as null silently. It is by design > since we don't want a long-running job aborted by some casting failure. But > there are scenarios that users want to make sure all the data conversion are > correct, like the way they use MySQL/PostgreSQL/Oracle. > This one has a bigger scope than > https://issues.apache.org/jira/browse/SPARK-28741 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-29586) spark jdbc method param lowerBound and upperBound DataType wrong
[ https://issues.apache.org/jira/browse/SPARK-29586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hu Fuwang updated SPARK-29586: -- Comment: was deleted (was: I am working on this.) > spark jdbc method param lowerBound and upperBound DataType wrong > > > Key: SPARK-29586 > URL: https://issues.apache.org/jira/browse/SPARK-29586 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.4, 3.0.0 >Reporter: daile >Priority: Major > > > {code:java} > private def toBoundValueInWhereClause( > value: Long, > columnType: DataType, > timeZoneId: String): String = { > def dateTimeToString(): String = { > val dateTimeStr = columnType match { > case DateType => DateFormatter().format(value.toInt) > case TimestampType => > val timestampFormatter = TimestampFormatter.getFractionFormatter( > DateTimeUtils.getZoneId(timeZoneId)) > DateTimeUtils.timestampToString(timestampFormatter, value) > } > s"'$dateTimeStr'" > } > columnType match { > case _: NumericType => value.toString > case DateType | TimestampType => dateTimeToString() > } > }{code} > partitionColumn supoort NumericType, TimestampType, TimestampType but jdbc > method only accept Long > > {code:java} > test("jdbc Suite2") { > val df = spark > .read > .option("partitionColumn", "B") > .option("lowerBound", "2017-01-01 10:00:00") > .option("upperBound", "2019-01-01 10:00:00") > .option("numPartitions", 5) > .jdbc(urlWithUserAndPass, "TEST.TIMETYPES", new Properties()) > df.printSchema() > df.show() > } > {code} > it's OK > > {code:java} > test("jdbc Suite") { val df = spark.read.jdbc(urlWithUserAndPass, > "TEST.TIMETYPES", "B", 1571899768024L, 1571899768024L, 5, new Properties()) > df.printSchema() df.show() } > {code} > > {code:java} > java.lang.IllegalArgumentException: Cannot parse the bound value > 1571899768024 as datejava.lang.IllegalArgumentException: Cannot parse the > bound value 1571899768024 as date at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.$anonfun$toInternalBoundValue$1(JDBCRelation.scala:184) > at scala.Option.getOrElse(Option.scala:189) at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.parse$1(JDBCRelation.scala:183) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.toInternalBoundValue(JDBCRelation.scala:189) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.columnPartition(JDBCRelation.scala:88) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:36) > at > org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:339) > at > org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:240) > at > org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:229) > at scala.Option.getOrElse(Option.scala:189) at > org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:229) at > org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:179) at > org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:255) at > org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:297) at > org.apache.spark.sql.jdbc.JDBCSuite.$anonfun$new$186(JDBCSuite.scala:1664) at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at > org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) at > org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) at > org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at > org.scalatest.Transformer.apply(Transformer.scala:22) at > org.scalatest.Transformer.apply(Transformer.scala:20) at > org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) at > org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:149) at > org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184) at > org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196) at > org.scalatest.SuperEngine.runTestImpl(Engine.scala:289) at > org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196) at > org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178) at > org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:56) > at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221) at > org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214) at > org.apache.spark.sql.jdbc.JDBCSuite.org$scalatest$BeforeAndAfter$$super$runTest(JDBCSuite.scala:43) > at org.scalatest.BeforeAndAfter.runTest(BeforeAndAfter.scala:203) at > org.scalatest.BeforeAndAfter.runTest$(BeforeAndAfter.scala:192) at >
[jira] [Commented] (SPARK-29586) spark jdbc method param lowerBound and upperBound DataType wrong
[ https://issues.apache.org/jira/browse/SPARK-29586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958610#comment-16958610 ] Hu Fuwang commented on SPARK-29586: --- I am working on this. > spark jdbc method param lowerBound and upperBound DataType wrong > > > Key: SPARK-29586 > URL: https://issues.apache.org/jira/browse/SPARK-29586 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.4, 3.0.0 >Reporter: daile >Priority: Major > > > {code:java} > private def toBoundValueInWhereClause( > value: Long, > columnType: DataType, > timeZoneId: String): String = { > def dateTimeToString(): String = { > val dateTimeStr = columnType match { > case DateType => DateFormatter().format(value.toInt) > case TimestampType => > val timestampFormatter = TimestampFormatter.getFractionFormatter( > DateTimeUtils.getZoneId(timeZoneId)) > DateTimeUtils.timestampToString(timestampFormatter, value) > } > s"'$dateTimeStr'" > } > columnType match { > case _: NumericType => value.toString > case DateType | TimestampType => dateTimeToString() > } > }{code} > partitionColumn supoort NumericType, TimestampType, TimestampType but jdbc > method only accept Long > test("jdbc Suite2") { > val df = spark > .read > .option("partitionColumn", "B") > .option("lowerBound", "2017-01-01 10:00:00") > .option("upperBound", "2019-01-01 10:00:00") > .option("numPartitions", 5) > .jdbc(urlWithUserAndPass, "TEST.TIMETYPES", new Properties()) > df.printSchema() > df.show() > } > test("jdbc Suite2") { > val df = spark > .read > .option("partitionColumn", "B") > .option("lowerBound", "2017-01-01 10:00:00") > .option("upperBound", "2019-01-01 10:00:00") > .option("numPartitions", 5) > .jdbc(urlWithUserAndPass, "TEST.TIMETYPES", new Properties()) > df.printSchema() > df.show() > } > test("jdbc Suite") { > val df = spark.read.jdbc(urlWithUserAndPass, "TEST.TIMETYPES", "B", > 1571899768024L, 1571899768024L, 5, new Properties()) > df.printSchema() > df.show() > } > java.lang.IllegalArgumentException: Cannot parse the bound value > 1571899768024 as date > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.$anonfun$toInternalBoundValue$1(JDBCRelation.scala:184) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.parse$1(JDBCRelation.scala:183) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.toInternalBoundValue(JDBCRelation.scala:189) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.columnPartition(JDBCRelation.scala:88) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:36) > at > org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:339) > at > org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:240) > at > org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:229) > at scala.Option.getOrElse(Option.scala:189) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:229) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:179) > at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:255) > at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:297) > at > org.apache.spark.sql.jdbc.JDBCSuite.$anonfun$new$186(JDBCSuite.scala:1664) > at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) > at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:149) > at org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184) > at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289) > at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196) > at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178) > at > org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:56) > at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221) > at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214) > at >
[jira] [Commented] (SPARK-21287) Cannot use Int.MIN_VALUE as Spark SQL fetchsize
[ https://issues.apache.org/jira/browse/SPARK-21287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16957939#comment-16957939 ] Hu Fuwang commented on SPARK-21287: --- [~smilegator] [~srowen] Just submitted a PR for this : [https://github.com/apache/spark/pull/26230] Please help review. > Cannot use Int.MIN_VALUE as Spark SQL fetchsize > --- > > Key: SPARK-21287 > URL: https://issues.apache.org/jira/browse/SPARK-21287 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1 >Reporter: Maciej Bryński >Priority: Major > > MySQL JDBC driver gives possibility to not store ResultSet in memory. > We can do this by setting fetchSize to Int.MIN_VALUE. > Unfortunately this configuration isn't correct in Spark. > {code} > java.lang.IllegalArgumentException: requirement failed: Invalid value > `-2147483648` for parameter `fetchsize`. The minimum value is 0. When the > value is 0, the JDBC driver ignores the value and does the estimates. > at scala.Predef$.require(Predef.scala:224) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.(JDBCOptions.scala:105) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.(JDBCOptions.scala:34) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:32) > at > org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125) > at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:166) > at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:206) > at sun.reflect.GeneratedMethodAccessor46.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) > at py4j.Gateway.invoke(Gateway.java:280) > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at py4j.GatewayConnection.run(GatewayConnection.java:214) > at java.lang.Thread.run(Thread.java:748) > {code} > https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29541) Add missing modules in spark-parent pom file
[ https://issues.apache.org/jira/browse/SPARK-29541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hu Fuwang resolved SPARK-29541. --- Resolution: Not A Problem > Add missing modules in spark-parent pom file > > > Key: SPARK-29541 > URL: https://issues.apache.org/jira/browse/SPARK-29541 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.0.0 >Reporter: Hu Fuwang >Priority: Major > Attachments: individual-module-build-failure.png > > > Per Spark build guide > [https://spark.apache.org/docs/latest/building-spark.html#building-submodules-individually] > , individual module can be built separately with command : > {code:java} > ./build/mvn -pl : clean install{code} > However, for below modules, the build command above failed : > {code:java} > common/network-yarn > external/docker-integration-tests > external/kinesis-asl-assembly > external/kinesis-asl > external/spark-ganglia-lgpl > hadoop-cloud > resource-managers/mesos > resource-managers/yarn > sql/hive-thriftserver > {code} > Attached is the snapshot of a sample failure of such build. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29541) Add missing modules in spark-parent pom file
[ https://issues.apache.org/jira/browse/SPARK-29541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16956748#comment-16956748 ] Hu Fuwang commented on SPARK-29541: --- [~Qin Yao] yes, thank you kent. will close this jira. > Add missing modules in spark-parent pom file > > > Key: SPARK-29541 > URL: https://issues.apache.org/jira/browse/SPARK-29541 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.0.0 >Reporter: Hu Fuwang >Priority: Major > Attachments: individual-module-build-failure.png > > > Per Spark build guide > [https://spark.apache.org/docs/latest/building-spark.html#building-submodules-individually] > , individual module can be built separately with command : > {code:java} > ./build/mvn -pl : clean install{code} > However, for below modules, the build command above failed : > {code:java} > common/network-yarn > external/docker-integration-tests > external/kinesis-asl-assembly > external/kinesis-asl > external/spark-ganglia-lgpl > hadoop-cloud > resource-managers/mesos > resource-managers/yarn > sql/hive-thriftserver > {code} > Attached is the snapshot of a sample failure of such build. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29541) Add missing modules in spark-parent pom file
[ https://issues.apache.org/jira/browse/SPARK-29541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hu Fuwang updated SPARK-29541: -- Description: Per Spark build guide [https://spark.apache.org/docs/latest/building-spark.html#building-submodules-individually] , individual module can be built separately with command : {code:java} ./build/mvn -pl : clean install{code} However, for below modules, the build command above failed : {code:java} common/network-yarn external/docker-integration-tests external/kinesis-asl-assembly external/kinesis-asl external/spark-ganglia-lgpl hadoop-cloud resource-managers/mesos resource-managers/yarn sql/hive-thriftserver {code} Attached is the snapshot of a sample failure of such build. was: Per Spark [link title|[https://spark.apache.org/docs/latest/building-spark.html#building-submodules-individually]], individual module can be built separately with command : {code:java} ./build/mvn -pl : clean install{code} However, for below modules, the build command above failed : {code:java} common/network-yarn external/docker-integration-tests external/kinesis-asl-assembly external/kinesis-asl external/spark-ganglia-lgpl hadoop-cloud resource-managers/mesos resource-managers/yarn sql/hive-thriftserver {code} Attached is the snapshot of a sample failure of such build. > Add missing modules in spark-parent pom file > > > Key: SPARK-29541 > URL: https://issues.apache.org/jira/browse/SPARK-29541 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.0.0 >Reporter: Hu Fuwang >Priority: Major > Attachments: individual-module-build-failure.png > > > Per Spark build guide > [https://spark.apache.org/docs/latest/building-spark.html#building-submodules-individually] > , individual module can be built separately with command : > {code:java} > ./build/mvn -pl : clean install{code} > However, for below modules, the build command above failed : > {code:java} > common/network-yarn > external/docker-integration-tests > external/kinesis-asl-assembly > external/kinesis-asl > external/spark-ganglia-lgpl > hadoop-cloud > resource-managers/mesos > resource-managers/yarn > sql/hive-thriftserver > {code} > Attached is the snapshot of a sample failure of such build. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29541) Add missing modules in spark-parent pom file
[ https://issues.apache.org/jira/browse/SPARK-29541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hu Fuwang updated SPARK-29541: -- Description: Per Spark [link title|[https://spark.apache.org/docs/latest/building-spark.html#building-submodules-individually]], individual module can be built separately with command : {code:java} ./build/mvn -pl : clean install{code} However, for below modules, the build command above failed : {code:java} common/network-yarn external/docker-integration-tests external/kinesis-asl-assembly external/kinesis-asl external/spark-ganglia-lgpl hadoop-cloud resource-managers/mesos resource-managers/yarn sql/hive-thriftserver {code} Attached is the snapshot of a sample failure of such build. was: Per Spark [build guide|http://www.google.com], individual module can be built separately with command : {code:java} ./build/mvn -pl : clean install{code} However, for below modules, the build command above failed : {code:java} common/network-yarn external/docker-integration-tests external/kinesis-asl-assembly external/kinesis-asl external/spark-ganglia-lgpl hadoop-cloud resource-managers/mesos resource-managers/yarn sql/hive-thriftserver {code} Attached is the snapshot of a sample failure of such build. > Add missing modules in spark-parent pom file > > > Key: SPARK-29541 > URL: https://issues.apache.org/jira/browse/SPARK-29541 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.0.0 >Reporter: Hu Fuwang >Priority: Major > Attachments: individual-module-build-failure.png > > > Per Spark [link > title|[https://spark.apache.org/docs/latest/building-spark.html#building-submodules-individually]], > individual module can be built separately with command : > {code:java} > ./build/mvn -pl : clean install{code} > However, for below modules, the build command above failed : > {code:java} > common/network-yarn > external/docker-integration-tests > external/kinesis-asl-assembly > external/kinesis-asl > external/spark-ganglia-lgpl > hadoop-cloud > resource-managers/mesos > resource-managers/yarn > sql/hive-thriftserver > {code} > Attached is the snapshot of a sample failure of such build. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29541) Add missing modules in spark-parent pom file
[ https://issues.apache.org/jira/browse/SPARK-29541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hu Fuwang updated SPARK-29541: -- Description: Per Spark [build guide|http://www.google.com], individual module can be built separately with command : {code:java} ./build/mvn -pl : clean install{code} However, for below modules, the build command above failed : {code:java} common/network-yarn external/docker-integration-tests external/kinesis-asl-assembly external/kinesis-asl external/spark-ganglia-lgpl hadoop-cloud resource-managers/mesos resource-managers/yarn sql/hive-thriftserver {code} Attached is the snapshot of a sample failure of such build. was: Per Spark [link title|http://example.com], individual module can be built separately with command : {code:java} ./build/mvn -pl : clean install{code} However, for below modules, the build command above failed : {code:java} common/network-yarn external/docker-integration-tests external/kinesis-asl-assembly external/kinesis-asl external/spark-ganglia-lgpl hadoop-cloud resource-managers/mesos resource-managers/yarn sql/hive-thriftserver {code} Attached is the snapshot of a sample failure of such build. > Add missing modules in spark-parent pom file > > > Key: SPARK-29541 > URL: https://issues.apache.org/jira/browse/SPARK-29541 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.0.0 >Reporter: Hu Fuwang >Priority: Major > Attachments: individual-module-build-failure.png > > > Per Spark [build guide|http://www.google.com], individual module can be built > separately with command : > {code:java} > ./build/mvn -pl : clean install{code} > However, for below modules, the build command above failed : > {code:java} > common/network-yarn > external/docker-integration-tests > external/kinesis-asl-assembly > external/kinesis-asl > external/spark-ganglia-lgpl > hadoop-cloud > resource-managers/mesos > resource-managers/yarn > sql/hive-thriftserver > {code} > Attached is the snapshot of a sample failure of such build. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29541) Add missing modules in spark-parent pom file
[ https://issues.apache.org/jira/browse/SPARK-29541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hu Fuwang updated SPARK-29541: -- Description: Per Spark [link title|http://example.com], individual module can be built separately with command : {code:java} ./build/mvn -pl : clean install{code} However, for below modules, the build command above failed : {code:java} common/network-yarn external/docker-integration-tests external/kinesis-asl-assembly external/kinesis-asl external/spark-ganglia-lgpl hadoop-cloud resource-managers/mesos resource-managers/yarn sql/hive-thriftserver {code} Attached is the snapshot of a sample failure of such build. was: Per Spark [build guide|[https://spark.apache.org/docs/latest/building-spark.html#building-submodules-individually]] , individual module can be built separately with command : {code:java} ./build/mvn -pl : clean install{code} However, for below modules, the build command above failed : {code:java} common/network-yarn external/docker-integration-tests external/kinesis-asl-assembly external/kinesis-asl external/spark-ganglia-lgpl hadoop-cloud resource-managers/mesos resource-managers/yarn sql/hive-thriftserver {code} Attached is the snapshot of a sample failure of such build. > Add missing modules in spark-parent pom file > > > Key: SPARK-29541 > URL: https://issues.apache.org/jira/browse/SPARK-29541 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.0.0 >Reporter: Hu Fuwang >Priority: Major > Attachments: individual-module-build-failure.png > > > Per Spark [link title|http://example.com], individual module can be built > separately with command : > {code:java} > ./build/mvn -pl : clean install{code} > However, for below modules, the build command above failed : > {code:java} > common/network-yarn > external/docker-integration-tests > external/kinesis-asl-assembly > external/kinesis-asl > external/spark-ganglia-lgpl > hadoop-cloud > resource-managers/mesos > resource-managers/yarn > sql/hive-thriftserver > {code} > Attached is the snapshot of a sample failure of such build. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29541) Add missing modules in spark-parent pom file
[ https://issues.apache.org/jira/browse/SPARK-29541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hu Fuwang updated SPARK-29541: -- Description: Per Spark [build guide|[https://spark.apache.org/docs/latest/building-spark.html#building-submodules-individually]] , individual module can be built separately with command : {code:java} ./build/mvn -pl : clean install{code} However, for below modules, the build command above failed : {code:java} common/network-yarn external/docker-integration-tests external/kinesis-asl-assembly external/kinesis-asl external/spark-ganglia-lgpl hadoop-cloud resource-managers/mesos resource-managers/yarn sql/hive-thriftserver {code} Attached is the snapshot of a sample failure of such build. was: Per Spark [build guide|#building-submodules-individually] , individual module can be built separately with command : {code:java} ./build/mvn -pl : clean install{code} However, for below modules, the build command above failed : {code:java} common/network-yarn external/docker-integration-tests external/kinesis-asl-assembly external/kinesis-asl external/spark-ganglia-lgpl hadoop-cloud resource-managers/mesos resource-managers/yarn sql/hive-thriftserver {code} Attached is the snapshot of a sample failure of such build. > Add missing modules in spark-parent pom file > > > Key: SPARK-29541 > URL: https://issues.apache.org/jira/browse/SPARK-29541 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.0.0 >Reporter: Hu Fuwang >Priority: Major > Attachments: individual-module-build-failure.png > > > Per Spark [build > guide|[https://spark.apache.org/docs/latest/building-spark.html#building-submodules-individually]] > , individual module can be built separately with command : > {code:java} > ./build/mvn -pl : clean install{code} > However, for below modules, the build command above failed : > {code:java} > common/network-yarn > external/docker-integration-tests > external/kinesis-asl-assembly > external/kinesis-asl > external/spark-ganglia-lgpl > hadoop-cloud > resource-managers/mesos > resource-managers/yarn > sql/hive-thriftserver > {code} > Attached is the snapshot of a sample failure of such build. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29541) Add missing modules in spark-parent pom file
[ https://issues.apache.org/jira/browse/SPARK-29541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hu Fuwang updated SPARK-29541: -- Description: Per Spark [build guide|#building-submodules-individually] , individual module can be built separately with command : {code:java} ./build/mvn -pl : clean install{code} However, for below modules, the build command above failed : {code:java} common/network-yarn external/docker-integration-tests external/kinesis-asl-assembly external/kinesis-asl external/spark-ganglia-lgpl hadoop-cloud resource-managers/mesos resource-managers/yarn sql/hive-thriftserver {code} Attached is the snapshot of a sample failure of such build. was: Per Spark [build guide|#building-submodules-individually]] , individual module can be built separately with command : {code:java} ./build/mvn -pl : clean install{code} However, for below modules, the build command above failed : {code:java} common/network-yarn external/docker-integration-tests external/kinesis-asl-assembly external/kinesis-asl external/spark-ganglia-lgpl hadoop-cloud resource-managers/mesos resource-managers/yarn sql/hive-thriftserver {code} Attached is the snapshot of a sample failure of such build. > Add missing modules in spark-parent pom file > > > Key: SPARK-29541 > URL: https://issues.apache.org/jira/browse/SPARK-29541 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.0.0 >Reporter: Hu Fuwang >Priority: Major > Attachments: individual-module-build-failure.png > > > Per Spark [build guide|#building-submodules-individually] , individual module > can be built separately with command : > {code:java} > ./build/mvn -pl : clean install{code} > However, for below modules, the build command above failed : > {code:java} > common/network-yarn > external/docker-integration-tests > external/kinesis-asl-assembly > external/kinesis-asl > external/spark-ganglia-lgpl > hadoop-cloud > resource-managers/mesos > resource-managers/yarn > sql/hive-thriftserver > {code} > Attached is the snapshot of a sample failure of such build. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29541) Add missing modules in spark-parent pom file
[ https://issues.apache.org/jira/browse/SPARK-29541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hu Fuwang updated SPARK-29541: -- Description: Per Spark [build guide|#building-submodules-individually]] , individual module can be built separately with command : {code:java} ./build/mvn -pl : clean install{code} However, for below modules, the build command above failed : {code:java} common/network-yarn external/docker-integration-tests external/kinesis-asl-assembly external/kinesis-asl external/spark-ganglia-lgpl hadoop-cloud resource-managers/mesos resource-managers/yarn sql/hive-thriftserver {code} Attached is the snapshot of a sample failure of such build. was: Per Spark [build guide|#building-submodules-individually]] , individual module can be built separately with command : {code:java} ./build/mvn -pl : clean install{code} However, for below modules, the build command above failed : {code:java} common/network-yarn external/docker-integration-tests external/kinesis-asl-assembly external/kinesis-asl external/spark-ganglia-lgpl hadoop-cloud resource-managers/mesos resource-managers/yarn sql/hive-thriftserver {code} Attached is the snapshot of a sample failure of such build. This PR add the missing modules above into the module list in spark-parent pom file, to make these individual module builds work as expected. > Add missing modules in spark-parent pom file > > > Key: SPARK-29541 > URL: https://issues.apache.org/jira/browse/SPARK-29541 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.0.0 >Reporter: Hu Fuwang >Priority: Major > Attachments: individual-module-build-failure.png > > > Per Spark [build guide|#building-submodules-individually]] , individual > module can be built separately with command : > {code:java} > ./build/mvn -pl : clean install{code} > However, for below modules, the build command above failed : > {code:java} > common/network-yarn > external/docker-integration-tests > external/kinesis-asl-assembly > external/kinesis-asl > external/spark-ganglia-lgpl > hadoop-cloud > resource-managers/mesos > resource-managers/yarn > sql/hive-thriftserver > {code} > Attached is the snapshot of a sample failure of such build. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29541) Add missing modules in spark-parent pom file
[ https://issues.apache.org/jira/browse/SPARK-29541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hu Fuwang updated SPARK-29541: -- Description: Per Spark [build guide|#building-submodules-individually]] , individual module can be built separately with command : {code:java} ./build/mvn -pl : clean install{code} However, for below modules, the build command above failed : {code:java} common/network-yarn external/docker-integration-tests external/kinesis-asl-assembly external/kinesis-asl external/spark-ganglia-lgpl hadoop-cloud resource-managers/mesos resource-managers/yarn sql/hive-thriftserver {code} Attached is the snapshot of a sample failure of such build. This PR add the missing modules above into the module list in spark-parent pom file, to make these individual module builds work as expected. was: Per Spark [build guide|[https://spark.apache.org/docs/latest/building-spark.html#building-submodules-individually]] , individual module can be built separately with command : {code:java} ./build/mvn -pl : clean install{code} However, for below modules, the build command above failed : {code:java} common/network-yarn external/docker-integration-tests external/kinesis-asl-assembly external/kinesis-asl external/spark-ganglia-lgpl hadoop-cloud resource-managers/mesos resource-managers/yarn sql/hive-thriftserver {code} Attached is the snapshot of a sample failure of such build. > Add missing modules in spark-parent pom file > > > Key: SPARK-29541 > URL: https://issues.apache.org/jira/browse/SPARK-29541 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.0.0 >Reporter: Hu Fuwang >Priority: Major > Attachments: individual-module-build-failure.png > > > Per Spark [build guide|#building-submodules-individually]] , individual > module can be built separately with command : > {code:java} > ./build/mvn -pl : clean install{code} > However, for below modules, the build command above failed : > {code:java} > common/network-yarn > external/docker-integration-tests > external/kinesis-asl-assembly > external/kinesis-asl > external/spark-ganglia-lgpl > hadoop-cloud > resource-managers/mesos > resource-managers/yarn > sql/hive-thriftserver > {code} > Attached is the snapshot of a sample failure of such build. > This PR add the missing modules above into the module list in spark-parent > pom file, to make these individual module builds work as expected. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29541) Add missing modules in spark-parent pom file
[ https://issues.apache.org/jira/browse/SPARK-29541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hu Fuwang updated SPARK-29541: -- Attachment: individual-module-build-failure.png > Add missing modules in spark-parent pom file > > > Key: SPARK-29541 > URL: https://issues.apache.org/jira/browse/SPARK-29541 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.0.0 >Reporter: Hu Fuwang >Priority: Major > Attachments: individual-module-build-failure.png > > > Per Spark [build > guide|[https://spark.apache.org/docs/latest/building-spark.html#building-submodules-individually]] > , individual module can be built separately with command : > > {code:java} > ./build/mvn -pl : clean install{code} > However, for below modules, the build command above failed : > {code:java} > common/network-yarn > external/docker-integration-tests > external/kinesis-asl-assembly > external/kinesis-asl > external/spark-ganglia-lgpl > hadoop-cloud > resource-managers/mesos > resource-managers/yarn > sql/hive-thriftserver > {code} > Attached is the snapshot of a sample failure of such build. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29541) Add missing modules in spark-parent pom file
Hu Fuwang created SPARK-29541: - Summary: Add missing modules in spark-parent pom file Key: SPARK-29541 URL: https://issues.apache.org/jira/browse/SPARK-29541 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.0.0 Reporter: Hu Fuwang Per Spark [build guide|[https://spark.apache.org/docs/latest/building-spark.html#building-submodules-individually]] , individual module can be built separately with command : {code:java} ./build/mvn -pl : clean install{code} However, for below modules, the build command above failed : {code:java} common/network-yarn external/docker-integration-tests external/kinesis-asl-assembly external/kinesis-asl external/spark-ganglia-lgpl hadoop-cloud resource-managers/mesos resource-managers/yarn sql/hive-thriftserver {code} Attached is the snapshot of a sample failure of such build. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29531) Refine ThriftServerQueryTestSuite.blackList to reuse code of SQLQueryTestSuite.blackList
Hu Fuwang created SPARK-29531: - Summary: Refine ThriftServerQueryTestSuite.blackList to reuse code of SQLQueryTestSuite.blackList Key: SPARK-29531 URL: https://issues.apache.org/jira/browse/SPARK-29531 Project: Spark Issue Type: Improvement Components: Tests Affects Versions: 3.0.0 Reporter: Hu Fuwang Refine the test code in ThriftServerQueryTestSuite.blackList to reuse code of SQLQueryTestSuite.blackList. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org