[spark-website] branch asf-site updated: Update Spark 3.4 release window (#407)

2022-07-22 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new ad25dd72b Update Spark 3.4 release window (#407)
ad25dd72b is described below

commit ad25dd72b599178afd2390e131869b78d877b5c4
Author: Xinrong Meng 
AuthorDate: Fri Jul 22 17:37:32 2022 -0700

Update Spark 3.4 release window (#407)
---
 site/versioning-policy.html | 8 
 versioning-policy.md| 8 
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/site/versioning-policy.html b/site/versioning-policy.html
index 0851b5be3..437149a49 100644
--- a/site/versioning-policy.html
+++ b/site/versioning-policy.html
@@ -250,7 +250,7 @@ available APIs.
 generally be released about 6 months after 2.2.0. Maintenance releases happen 
as needed
 in between feature releases. Major releases do not happen according to a fixed 
schedule.
 
-Spark 3.3 release window
+Spark 3.4 release window
 
 
   
@@ -261,15 +261,15 @@ in between feature releases. Major releases do not happen 
according to a fixed s
   
   
 
-  March 15th 2022
+  January 15th 2023
   Code freeze. Release branch cut.
 
 
-  Late March 2022
+  Late January 2023
   QA period. Focus on bug fixes, tests, stability and docs. Generally, 
no new features merged.
 
 
-  April 2022
+  February 2023
   Release candidates (RC), voting, etc. until final release passes
 
   
diff --git a/versioning-policy.md b/versioning-policy.md
index 55a0bd331..c1136de67 100644
--- a/versioning-policy.md
+++ b/versioning-policy.md
@@ -103,13 +103,13 @@ In general, feature ("minor") releases occur about every 
6 months. Hence, Spark
 generally be released about 6 months after 2.2.0. Maintenance releases happen 
as needed
 in between feature releases. Major releases do not happen according to a fixed 
schedule.
 
-Spark 3.3 release window
+Spark 3.4 release window
 
 | Date  | Event |
 | - | - |
-| March 15th 2022 | Code freeze. Release branch cut.|
-| Late March 2022 | QA period. Focus on bug fixes, tests, stability and docs. 
Generally, no new features merged.|
-| April 2022 | Release candidates (RC), voting, etc. until final release 
passes|
+| January 15th 2023 | Code freeze. Release branch cut.|
+| Late January 2023 | QA period. Focus on bug fixes, tests, stability and 
docs. Generally, no new features merged.|
+| February 2023 | Release candidates (RC), voting, etc. until final release 
passes|
 
 Maintenance releases and EOL
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] dongjoon-hyun merged pull request #407: Update Spark 3.4 release window

2022-07-22 Thread GitBox


dongjoon-hyun merged PR #407:
URL: https://github.com/apache/spark-website/pull/407


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] dongjoon-hyun commented on pull request #407: Update Spark 3.4 release window

2022-07-22 Thread GitBox


dongjoon-hyun commented on PR #407:
URL: https://github.com/apache/spark-website/pull/407#issuecomment-1193022384

   According to the discussion on the dev mailing list, I'll merge this 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-39784][SQL] Put Literal values on the right side of the data source filter after translating Catalyst Expression to data source filter

2022-07-22 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2e2b1ae1021 [SPARK-39784][SQL] Put Literal values on the right side of 
the data source filter after translating Catalyst Expression to data source 
filter
2e2b1ae1021 is described below

commit 2e2b1ae1021bc4bc99f9749e05e4770be3aec43f
Author: huaxingao 
AuthorDate: Fri Jul 22 13:49:00 2022 -0700

[SPARK-39784][SQL] Put Literal values on the right side of the data source 
filter after translating Catalyst Expression to data source filter

### What changes were proposed in this pull request?

Even though the literal value could be on both sides of the filter, e.g. 
both `a > 1` and `1 < a` are valid, after translating Catalyst Expression to 
data source filter, we want the literal value on the right side so it's easier 
for the data source to handle these filters. We do this kind of normalization 
for V1 Filter. We should have the same behavior for V2 Filter.

Before this PR, for the filters that have literal values on the right side, 
e.g. `1 > a`, we keep it as is. After this PR, we will normalize it to `a < 1` 
so the data source doesn't need to check each of the filters (and do the flip).

### Why are the changes needed?
I think we should follow V1 Filter's behavior, normalize the filters during 
catalyst Expression to DS Filter translation time to make the literal values on 
the right side, so later on, data source doesn't need to check every single 
filter to figure out if it needs to flip the sides.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?
new test

Closes #37197 from huaxingao/flip.

Authored-by: huaxingao 
Signed-off-by: huaxingao 
---
 .../sql/catalyst/util/V2ExpressionBuilder.scala| 21 +++
 .../datasources/v2/DataSourceV2StrategySuite.scala | 67 +-
 2 files changed, 86 insertions(+), 2 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala
index 8bb65a88044..59cbcf48334 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala
@@ -233,6 +233,10 @@ class V2ExpressionBuilder(e: Expression, isPredicate: 
Boolean = false) {
   val r = generateExpression(b.right)
   if (l.isDefined && r.isDefined) {
 b match {
+  case _: Predicate if isBinaryComparisonOperator(b.sqlOperator) &&
+  l.get.isInstanceOf[LiteralValue[_]] && 
r.get.isInstanceOf[FieldReference] =>
+Some(new V2Predicate(flipComparisonOperatorName(b.sqlOperator),
+  Array[V2Expression](r.get, l.get)))
   case _: Predicate =>
 Some(new V2Predicate(b.sqlOperator, Array[V2Expression](l.get, 
r.get)))
   case _ =>
@@ -408,6 +412,23 @@ class V2ExpressionBuilder(e: Expression, isPredicate: 
Boolean = false) {
   }
 case _ => None
   }
+
+  private def isBinaryComparisonOperator(operatorName: String): Boolean = {
+operatorName match {
+  case ">" | "<" | ">=" | "<=" | "=" | "<=>" => true
+  case _ => false
+}
+  }
+
+  private def flipComparisonOperatorName(operatorName: String): String = {
+operatorName match {
+  case ">" => "<"
+  case "<" => ">"
+  case ">=" => "<="
+  case "<=" => ">="
+  case _ => operatorName
+}
+  }
 }
 
 object ColumnOrField {
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2StrategySuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2StrategySuite.scala
index 66dc65cf681..c3f51bed269 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2StrategySuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2StrategySuite.scala
@@ -18,14 +18,77 @@
 package org.apache.spark.sql.execution.datasources.v2
 
 import org.apache.spark.sql.catalyst.dsl.expressions._
-import org.apache.spark.sql.catalyst.expressions.Expression
+import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.plans.PlanTest
 import org.apache.spark.sql.connector.expressions.{FieldReference, 
LiteralValue}
 import org.apache.spark.sql.connector.expressions.filter.Predicate
 import org.apache.spark.sql.test.SharedSparkSession
-import org.apache.spark.sql.types.BooleanType
+import org.apache.spark.sql.types.{BooleanType, IntegerType, StringType, 
StructField, StructType}
 
 class DataSourceV2StrategySuite extends PlanTest with SharedSparkSes

[spark] branch master updated: [SPARK-38597][K8S][INFRA] Enable Spark on K8S integration tests

2022-07-22 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5e6aab49a04 [SPARK-38597][K8S][INFRA] Enable Spark on K8S integration 
tests
5e6aab49a04 is described below

commit 5e6aab49a046c19e85f2177df440c38c7277dc08
Author: Yikun Jiang 
AuthorDate: Fri Jul 22 09:21:08 2022 -0700

[SPARK-38597][K8S][INFRA] Enable Spark on K8S integration tests

### What changes were proposed in this pull request?
Enable Spark on K8S integration tests in Github Action based on minikube:
- The K8S IT will always triggered in user fork repo and `apache/spark` 
merged commits to master branch
- This PR does NOT contains Volcano related test due to limited resource of 
github action.
- minikube installation is allowed by Apache Infra: 
[INFRA-23000](https://issues.apache.org/jira/projects/INFRA/issues/INFRA-23000)
- Why setting driver 0.5 cpu, executor 0.2 cpu?
  * Github-hosted runner hardware limited: 
[2U7G](https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources),
 so cpu resource is very limited.
  * IT Job available CPU = 2U - 0.85U (K8S deploy) = 1.15U
  * There are 1.15 cpu left after k8s installation, to meet the requirement 
of K8S tests (one driver + max to 3 executors).
  * For memory: 6947 is maximum (Otherwise raise `Exiting due to 
RSRC_OVER_ALLOC_MEM: Requested memory allocation 7168MB is more than your 
system limit 6947MB.`), but this is not integer multiple of 1024, so I just set 
this to 6144 for better resource statistic.

- Time cost info:

  * 14 mins to compile related code.
  * 3 mins to build docker images.
  * 20-30 mins to test
  * Total: about 30-40 mins

### Why are the changes needed?

This will also improve the efficiency of K8S development and guarantee the 
quality of spark on K8S and spark docker image in some level.

### Does this PR introduce _any_ user-facing change?
No, dev only.

### How was this patch tested?
CI passed

Closes #35830

Closes #37244 from Yikun/SPARK-38597-k8s-it.

Authored-by: Yikun Jiang 
Signed-off-by: Dongjoon Hyun 
---
 .github/workflows/build_and_test.yml | 73 +++-
 1 file changed, 72 insertions(+), 1 deletion(-)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 02b799891fd..1902468e90c 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -99,7 +99,8 @@ jobs:
   \"docker-integration-tests\": \"$docker\",
   \"scala-213\": \"true\",
   \"java-11-17\": \"true\",
-  \"lint\" : \"true\"
+  \"lint\" : \"true\",
+  \"k8s-integration-tests\" : \"true\",
 }"
   echo $precondition # For debugging
   # GitHub Actions set-output doesn't take newlines
@@ -869,3 +870,73 @@ jobs:
   with:
 name: unit-tests-log-docker-integration--8-${{ inputs.hadoop }}-hive2.3
 path: "**/target/unit-tests.log"
+
+  k8s-integration-tests:
+needs: precondition
+if: fromJson(needs.precondition.outputs.required).k8s-integration-tests == 
'true'
+name: Run Spark on Kubernetes Integration test
+runs-on: ubuntu-20.04
+steps:
+  - name: Checkout Spark repository
+uses: actions/checkout@v2
+with:
+  fetch-depth: 0
+  repository: apache/spark
+  ref: ${{ inputs.branch }}
+  - name: Sync the current branch with the latest in Apache Spark
+if: github.repository != 'apache/spark'
+run: |
+  echo "APACHE_SPARK_REF=$(git rev-parse HEAD)" >> $GITHUB_ENV
+  git fetch https://github.com/$GITHUB_REPOSITORY.git 
${GITHUB_REF#refs/heads/}
+  git -c user.name='Apache Spark Test Account' -c 
user.email='sparktest...@gmail.com' merge --no-commit --progress --squash 
FETCH_HEAD
+  git -c user.name='Apache Spark Test Account' -c 
user.email='sparktest...@gmail.com' commit -m "Merged commit" --allow-empty
+  - name: Cache Scala, SBT and Maven
+uses: actions/cache@v2
+with:
+  path: |
+build/apache-maven-*
+build/scala-*
+build/*.jar
+~/.sbt
+  key: build-${{ hashFiles('**/pom.xml', 'project/build.properties', 
'build/mvn', 'build/sbt', 'build/sbt-launch-lib.bash', 
'build/spark-build-info') }}
+  restore-keys: |
+build-
+  - name: Cache Coursier local repository
+uses: actions/cache@v2
+with:
+  path: ~/.cache/coursier
+  key: k8s-integration-coursier-${{ hashFiles('**/pom.xml', 
'**/plugins.sbt') }}
+  restore-keys: |
+