date:20210619

[spark] branch master updated (1589d32 -> 6d30991)

2021-06-19 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 1589d32  [SPARK-35472][PYTHON] Fix disallow_untyped_defs mypy checks 
for pyspark.pandas.generic
 add 6d30991  [SPARK-35303][SPARK-35498][PYTHON][FOLLOW-UP] Copy local 
properties when starting the thread, and use inheritable thread in the current 
codebase

No new revisions were added by this update.

Summary of changes:
 python/pyspark/context.py   |  5 +-
 python/pyspark/ml/classification.py |  4 +-
 python/pyspark/ml/tuning.py | 10 ++--
 python/pyspark/util.py  | 99 +
 4 files changed, 79 insertions(+), 39 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (9eaf678 -> 1589d32)

2021-06-19 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9eaf678  [SPARK-35830][TESTS] Upgrade sbt-mima-plugin to 0.9.2
 add 1589d32  [SPARK-35472][PYTHON] Fix disallow_untyped_defs mypy checks 
for pyspark.pandas.generic

No new revisions were added by this update.

Summary of changes:
 python/mypy.ini  |   3 -
 python/pyspark/pandas/frame.py   |  19 +++-
 python/pyspark/pandas/generic.py | 223 ---
 python/pyspark/pandas/series.py  |  19 +++-
 4 files changed, 167 insertions(+), 97 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (86bcd1f -> 9eaf678)

2021-06-19 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 86bcd1f  [SPARK-35819][SQL] Support Cast between different field 
YearMonthIntervalType
 add 9eaf678  [SPARK-35830][TESTS] Upgrade sbt-mima-plugin to 0.9.2

No new revisions were added by this update.

Summary of changes:
 project/plugins.sbt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-35819][SQL] Support Cast between different field YearMonthIntervalType

2021-06-19 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 86bcd1f  [SPARK-35819][SQL] Support Cast between different field 
YearMonthIntervalType
86bcd1f is described below

commit 86bcd1fba09d9b5e4d36a48824354aaae769fa21
Author: Angerszh 
AuthorDate: Sat Jun 19 21:43:06 2021 +0300

[SPARK-35819][SQL] Support Cast between different field 
YearMonthIntervalType

### What changes were proposed in this pull request?
 Support Cast between different field YearMonthIntervalType

### Why are the changes needed?
Make user convenient to get different field YearMonthIntervalType

### Does this PR introduce _any_ user-facing change?
User can call cast YearMonthIntervalType(YEAR, MONTH) to 
YearMonthIntervalType(YEAR, YEAR) etc

### How was this patch tested?
Added UT

Closes #32974 from AngersZh/SPARK-35819.

Authored-by: Angerszh 
Signed-off-by: Max Gekk 
---
 .../org/apache/spark/sql/catalyst/expressions/Cast.scala | 12 
 .../apache/spark/sql/catalyst/expressions/CastSuite.scala| 11 +++
 2 files changed, 23 insertions(+)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
index 52801ec..cdf0753 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
@@ -82,6 +82,8 @@ object Cast {
 case (StringType, _: DayTimeIntervalType) => true
 case (StringType, _: YearMonthIntervalType) => true
 
+case (_: YearMonthIntervalType, _: YearMonthIntervalType) => true
+
 case (StringType, _: NumericType) => true
 case (BooleanType, _: NumericType) => true
 case (DateType, _: NumericType) => true
@@ -580,6 +582,8 @@ abstract class CastBase extends UnaryExpression with 
TimeZoneAwareExpression wit
   it: YearMonthIntervalType): Any => Any = from match {
 case StringType => buildCast[UTF8String](_, s =>
   IntervalUtils.castStringToYMInterval(s, it.startField, it.endField))
+case _: YearMonthIntervalType => buildCast[Int](_, s =>
+  IntervalUtils.periodToMonths(IntervalUtils.monthsToPeriod(s), 
it.endField))
   }
 
   // LongConverter
@@ -1481,6 +1485,12 @@ abstract class CastBase extends UnaryExpression with 
TimeZoneAwareExpression wit
 code"""
   $evPrim = $util.castStringToYMInterval($c, (byte)${it.startField}, 
(byte)${it.endField});
 """
+case _: YearMonthIntervalType =>
+  val util = IntervalUtils.getClass.getCanonicalName.stripSuffix("$")
+  (c, evPrim, _) =>
+code"""
+  $evPrim = $util.periodToMonths($util.monthsToPeriod($c), 
(byte)${it.endField});
+"""
   }
 
   private[this] def decimalToTimestampCode(d: ExprValue): Block = {
@@ -2051,6 +2061,8 @@ object AnsiCast {
 case (StringType, _: DayTimeIntervalType) => true
 case (StringType, _: YearMonthIntervalType) => true
 
+case (_: YearMonthIntervalType, _: YearMonthIntervalType) => true
+
 case (StringType, DateType) => true
 case (TimestampType, DateType) => true
 case (TimestampWithoutTZType, DateType) => true
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
index d114968..51c3681 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
@@ -30,6 +30,7 @@ import org.apache.spark.sql.catalyst.util.DateTimeTestUtils._
 import org.apache.spark.sql.catalyst.util.DateTimeUtils._
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.types._
+import org.apache.spark.sql.types.YearMonthIntervalType._
 import org.apache.spark.unsafe.types.UTF8String
 
 /**
@@ -662,4 +663,14 @@ class CastSuite extends CastSuiteBase {
   checkEvaluation(cast(invalidInput, TimestampWithoutTZType), null)
 }
   }
+
+  test("SPARK-35819: Support cast YearMonthIntervalType in different fields") {
+val ym = cast(Literal.create("1-1"), YearMonthIntervalType(YEAR, MONTH))
+Seq(YearMonthIntervalType(YEAR, YEAR) -> 12,
+  YearMonthIntervalType(YEAR, MONTH) -> 13,
+  YearMonthIntervalType(MONTH, MONTH) -> 13)
+  .foreach { case (dt, value) =>
+checkEvaluation(cast(ym, dt), value)
+  }
+  }
 }

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (aab37ed -> a39f1ea)

2021-06-19 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from aab37ed  [SPARK-35593][K8S][TESTS][FOLLOWUP] Run 
KubernetesLocalDiskShuffleDataIOSuite on a dedicated JVM
 add a39f1ea  [SPARK-35824][CORE][TESTS] Convert LevelDBSuite.IntKeyType 
from a nested class to a normal class

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/util/kvstore/DBIteratorSuite.java |  2 +-
 .../{ArrayKeyIndexType.java => IntKeyType.java}| 19 ---
 .../apache/spark/util/kvstore/LevelDBSuite.java| 27 --
 3 files changed, 12 insertions(+), 36 deletions(-)
 copy 
common/kvstore/src/test/java/org/apache/spark/util/kvstore/{ArrayKeyIndexType.java
 => IntKeyType.java} (75%)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (74d647d -> aab37ed)

2021-06-19 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 74d647d  [SPARK-35825][INFRA] Increase the heap and stack size for 
Maven build
 add aab37ed  [SPARK-35593][K8S][TESTS][FOLLOWUP] Run 
KubernetesLocalDiskShuffleDataIOSuite on a dedicated JVM

No new revisions were added by this update.

Summary of changes:
 project/SparkBuild.scala | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-35825][INFRA] Increase the heap and stack size for Maven build

2021-06-19 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 74d647d  [SPARK-35825][INFRA] Increase the heap and stack size for 
Maven build
74d647d is described below

commit 74d647d2ca6b0471f0eb90a59bccb1ecc0a9cc8f
Author: Gengliang Wang 
AuthorDate: Sat Jun 19 10:44:46 2021 -0700

[SPARK-35825][INFRA] Increase the heap and stack size for Maven build

### What changes were proposed in this pull request?

Increase memory configuration for Maven build.
Stack size: 64MB => 128MB
Initial heap size: 1024MB => 2048MB
Maximum heap size: 1024MB => 2048MB

The SBT builds are ok so let's keep the current configuration.

### Why are the changes needed?

The jenkins jobs are unstable due to the stackoverflow errors:
 
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-3.2-jdk-11/

https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7/2274/

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Jenkins test

Closes #32961 from gengliangwang/increaseXss.

Authored-by: Gengliang Wang 
Signed-off-by: Dongjoon Hyun 
---
 pom.xml | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/pom.xml b/pom.xml
index e76c5f9..8bcb3a8 100644
--- a/pom.xml
+++ b/pom.xml
@@ -2604,9 +2604,9 @@
   -P:silencer:globalFilters=.*deprecated.*
 
 
-  -Xss64m
-  -Xms1024m
-  -Xmx1024m
+  -Xss128m
+  -Xms2048m
+  -Xmx2048m
   -XX:ReservedCodeCacheSize=${CodeCacheSize}
 
 

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-35726][SQL] Truncate java.time.Duration by fields of day-time interval type

2021-06-19 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2ebad72  [SPARK-35726][SQL] Truncate java.time.Duration by fields of 
day-time interval type
2ebad72 is described below

commit 2ebad727587e25b8bf4a8439593e7402ea4e2827
Author: Angerszh 
AuthorDate: Sat Jun 19 13:51:21 2021 +0300

[SPARK-35726][SQL] Truncate java.time.Duration by fields of day-time 
interval type

### What changes were proposed in this pull request?
Support truncate java.time.Duration by fields of day-time interval type.

### Why are the changes needed?
To respect fields of the target day-time interval types.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added UT

Closes #32950 from AngersZh/SPARK-35726.

Authored-by: Angerszh 
Signed-off-by: Max Gekk 
---
 .../spark/sql/catalyst/CatalystTypeConverters.scala  | 11 ++-
 .../spark/sql/catalyst/util/IntervalUtils.scala  | 15 +--
 .../org/apache/spark/sql/RandomDataGenerator.scala   | 13 -
 .../sql/catalyst/CatalystTypeConvertersSuite.scala   | 20 
 4 files changed, 51 insertions(+), 8 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala
index 08a5fd5..1742524 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala
@@ -32,6 +32,7 @@ import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.util._
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.types._
+import org.apache.spark.sql.types.DayTimeIntervalType._
 import org.apache.spark.sql.types.YearMonthIntervalType._
 import org.apache.spark.unsafe.types.UTF8String
 
@@ -76,8 +77,7 @@ object CatalystTypeConverters {
   case LongType => LongConverter
   case FloatType => FloatConverter
   case DoubleType => DoubleConverter
-  // TODO(SPARK-35726): Truncate java.time.Duration by fields of day-time 
interval type
-  case _: DayTimeIntervalType => DurationConverter
+  case DayTimeIntervalType(_, endField) => DurationConverter(endField)
   case YearMonthIntervalType(_, endField) => PeriodConverter(endField)
   case dataType: DataType => IdentityConverter(dataType)
 }
@@ -432,9 +432,10 @@ object CatalystTypeConverters {
 override def toScalaImpl(row: InternalRow, column: Int): Double = 
row.getDouble(column)
   }
 
-  private object DurationConverter extends CatalystTypeConverter[Duration, 
Duration, Any] {
+  private case class DurationConverter(endField: Byte)
+  extends CatalystTypeConverter[Duration, Duration, Any] {
 override def toCatalystImpl(scalaValue: Duration): Long = {
-  IntervalUtils.durationToMicros(scalaValue)
+  IntervalUtils.durationToMicros(scalaValue, endField)
 }
 override def toScala(catalystValue: Any): Duration = {
   if (catalystValue == null) null
@@ -523,7 +524,7 @@ object CatalystTypeConverters {
 map,
 (key: Any) => convertToCatalyst(key),
 (value: Any) => convertToCatalyst(value))
-case d: Duration => DurationConverter.toCatalyst(d)
+case d: Duration => DurationConverter(SECOND).toCatalyst(d)
 case p: Period => PeriodConverter(MONTH).toCatalyst(p)
 case other => other
   }
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala
index e87ea51..3f56d2f 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala
@@ -892,18 +892,29 @@ object IntervalUtils {
* @throws ArithmeticException If numeric overflow occurs
*/
   def durationToMicros(duration: Duration): Long = {
+durationToMicros(duration, DayTimeIntervalType.SECOND)
+  }
+
+  def durationToMicros(duration: Duration, endField: Byte): Long = {
 val seconds = duration.getSeconds
-if (seconds == minDurationSeconds) {
+val micros = if (seconds == minDurationSeconds) {
   val microsInSeconds = (minDurationSeconds + 1) * MICROS_PER_SECOND
   val nanoAdjustment = duration.getNano
   assert(0 <= nanoAdjustment && nanoAdjustment < NANOS_PER_SECOND,
 "Duration.getNano() must return the adjustment to the seconds field " +
-"in the range from 0 to 9 nanoseconds, inclusive.")
+  "in the range from 0 to 9 nanoseconds, inclusive.")

[spark] branch master updated: [SPARK-35818][BUILD] Upgrade SBT to 1.5.4

2021-06-19 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 94f7015  [SPARK-35818][BUILD] Upgrade SBT to 1.5.4
94f7015 is described below

commit 94f701587de7cd2ebc8c9b6de56e669345538112
Author: Dongjoon Hyun 
AuthorDate: Sat Jun 19 00:17:35 2021 -0700

[SPARK-35818][BUILD] Upgrade SBT to 1.5.4

### What changes were proposed in this pull request?

This PR aims to upgrade SBT to 1.5.4.

### Why are the changes needed?

SBT 1.5.4 is released 5 days ago.
- https://github.com/sbt/sbt/releases/tag/v1.5.4

This will bring the latest bug fixes like the following.

- Fixes BSP on ARM Macs by keeping JNI server socket to keep using JNI
- Fixes compiler ClassLoader list to use compilerJars.toList (For Scala 3, 
this drops support for 3.0.0-M2)
- Fixes undercompilation of package object causing "Symbol 'type X' is 
missing from the classpath"
- Fixes overcompilation with scalac -release flag
- Fixes build/exit notification not closing BSP channel
- Fixes POM file's Maven repository ID character restriction to match that 
of Maven

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

Closes #32966 from dongjoon-hyun/SPARK-35818.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 project/build.properties | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/project/build.properties b/project/build.properties
index bef3260..78a23cc 100644
--- a/project/build.properties
+++ b/project/build.properties
@@ -14,4 +14,4 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #
-sbt.version=1.5.3
+sbt.version=1.5.4

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-35593][K8S][TESTS][FOLLOWUP] Increase timeout in KubernetesLocalDiskShuffleDataIOSuite

2021-06-19 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b9d6473  [SPARK-35593][K8S][TESTS][FOLLOWUP] Increase timeout in 
KubernetesLocalDiskShuffleDataIOSuite
b9d6473 is described below

commit b9d6473e898cea255bbbc27f657e2958fd4c011b
Author: Dongjoon Hyun 
AuthorDate: Sat Jun 19 15:22:29 2021 +0900

[SPARK-35593][K8S][TESTS][FOLLOWUP] Increase timeout in 
KubernetesLocalDiskShuffleDataIOSuite

### What changes were proposed in this pull request?

This increases the timeout from 10 seconds to 60 seconds in 
KubernetesLocalDiskShuffleDataIOSuite to reduce the flakiness.

### Why are the changes needed?

- 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140003/testReport/

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs

Closes #32967 from dongjoon-hyun/SPARK-35593-2.

Authored-by: Dongjoon Hyun 
Signed-off-by: Kousuke Saruta 
---
 .../apache/spark/shuffle/KubernetesLocalDiskShuffleDataIOSuite.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/shuffle/KubernetesLocalDiskShuffleDataIOSuite.scala
 
b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/shuffle/KubernetesLocalDiskShuffleDataIOSuite.scala
index e94e8dd..eca38a8 100644
--- 
a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/shuffle/KubernetesLocalDiskShuffleDataIOSuite.scala
+++ 
b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/shuffle/KubernetesLocalDiskShuffleDataIOSuite.scala
@@ -210,7 +210,7 @@ class KubernetesLocalDiskShuffleDataIOSuite extends 
SparkFunSuite with LocalSpar
 assert(master.shuffleStatuses(1).mapStatuses.forall(_ == null))
   }
   sc.parallelize(Seq((1, 1)), 2).groupByKey().collect()
-  eventually(timeout(10.second), interval(1.seconds)) {
+  eventually(timeout(60.second), interval(1.seconds)) {
 assert(master.shuffleStatuses(0).mapStatuses.map(_.mapId).toSet == 
Set(0, 1, 2))
 assert(master.shuffleStatuses(1).mapStatuses.map(_.mapId).toSet == 
Set(6, 7, 8))
   }

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (1589d32 -> 6d30991)

[spark] branch master updated (9eaf678 -> 1589d32)

[spark] branch master updated (86bcd1f -> 9eaf678)

[spark] branch master updated: [SPARK-35819][SQL] Support Cast between different field YearMonthIntervalType

[spark] branch master updated (aab37ed -> a39f1ea)

[spark] branch master updated (74d647d -> aab37ed)

[spark] branch master updated: [SPARK-35825][INFRA] Increase the heap and stack size for Maven build

[spark] branch master updated: [SPARK-35726][SQL] Truncate java.time.Duration by fields of day-time interval type

[spark] branch master updated: [SPARK-35818][BUILD] Upgrade SBT to 1.5.4

[spark] branch master updated: [SPARK-35593][K8S][TESTS][FOLLOWUP] Increase timeout in KubernetesLocalDiskShuffleDataIOSuite

10 matches

Site Navigation

Mail list logo

Footer information