[spark] branch master updated (1589d32 -> 6d30991)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 1589d32 [SPARK-35472][PYTHON] Fix disallow_untyped_defs mypy checks for pyspark.pandas.generic add 6d30991 [SPARK-35303][SPARK-35498][PYTHON][FOLLOW-UP] Copy local properties when starting the thread, and use inheritable thread in the current codebase No new revisions were added by this update. Summary of changes: python/pyspark/context.py | 5 +- python/pyspark/ml/classification.py | 4 +- python/pyspark/ml/tuning.py | 10 ++-- python/pyspark/util.py | 99 + 4 files changed, 79 insertions(+), 39 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9eaf678 -> 1589d32)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9eaf678 [SPARK-35830][TESTS] Upgrade sbt-mima-plugin to 0.9.2 add 1589d32 [SPARK-35472][PYTHON] Fix disallow_untyped_defs mypy checks for pyspark.pandas.generic No new revisions were added by this update. Summary of changes: python/mypy.ini | 3 - python/pyspark/pandas/frame.py | 19 +++- python/pyspark/pandas/generic.py | 223 --- python/pyspark/pandas/series.py | 19 +++- 4 files changed, 167 insertions(+), 97 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (86bcd1f -> 9eaf678)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 86bcd1f [SPARK-35819][SQL] Support Cast between different field YearMonthIntervalType add 9eaf678 [SPARK-35830][TESTS] Upgrade sbt-mima-plugin to 0.9.2 No new revisions were added by this update. Summary of changes: project/plugins.sbt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35819][SQL] Support Cast between different field YearMonthIntervalType
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 86bcd1f [SPARK-35819][SQL] Support Cast between different field YearMonthIntervalType 86bcd1f is described below commit 86bcd1fba09d9b5e4d36a48824354aaae769fa21 Author: Angerszh AuthorDate: Sat Jun 19 21:43:06 2021 +0300 [SPARK-35819][SQL] Support Cast between different field YearMonthIntervalType ### What changes were proposed in this pull request? Support Cast between different field YearMonthIntervalType ### Why are the changes needed? Make user convenient to get different field YearMonthIntervalType ### Does this PR introduce _any_ user-facing change? User can call cast YearMonthIntervalType(YEAR, MONTH) to YearMonthIntervalType(YEAR, YEAR) etc ### How was this patch tested? Added UT Closes #32974 from AngersZh/SPARK-35819. Authored-by: Angerszh Signed-off-by: Max Gekk --- .../org/apache/spark/sql/catalyst/expressions/Cast.scala | 12 .../apache/spark/sql/catalyst/expressions/CastSuite.scala| 11 +++ 2 files changed, 23 insertions(+) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala index 52801ec..cdf0753 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala @@ -82,6 +82,8 @@ object Cast { case (StringType, _: DayTimeIntervalType) => true case (StringType, _: YearMonthIntervalType) => true +case (_: YearMonthIntervalType, _: YearMonthIntervalType) => true + case (StringType, _: NumericType) => true case (BooleanType, _: NumericType) => true case (DateType, _: NumericType) => true @@ -580,6 +582,8 @@ abstract class CastBase extends UnaryExpression with TimeZoneAwareExpression wit it: YearMonthIntervalType): Any => Any = from match { case StringType => buildCast[UTF8String](_, s => IntervalUtils.castStringToYMInterval(s, it.startField, it.endField)) +case _: YearMonthIntervalType => buildCast[Int](_, s => + IntervalUtils.periodToMonths(IntervalUtils.monthsToPeriod(s), it.endField)) } // LongConverter @@ -1481,6 +1485,12 @@ abstract class CastBase extends UnaryExpression with TimeZoneAwareExpression wit code""" $evPrim = $util.castStringToYMInterval($c, (byte)${it.startField}, (byte)${it.endField}); """ +case _: YearMonthIntervalType => + val util = IntervalUtils.getClass.getCanonicalName.stripSuffix("$") + (c, evPrim, _) => +code""" + $evPrim = $util.periodToMonths($util.monthsToPeriod($c), (byte)${it.endField}); +""" } private[this] def decimalToTimestampCode(d: ExprValue): Block = { @@ -2051,6 +2061,8 @@ object AnsiCast { case (StringType, _: DayTimeIntervalType) => true case (StringType, _: YearMonthIntervalType) => true +case (_: YearMonthIntervalType, _: YearMonthIntervalType) => true + case (StringType, DateType) => true case (TimestampType, DateType) => true case (TimestampWithoutTZType, DateType) => true diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala index d114968..51c3681 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala @@ -30,6 +30,7 @@ import org.apache.spark.sql.catalyst.util.DateTimeTestUtils._ import org.apache.spark.sql.catalyst.util.DateTimeUtils._ import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.types._ +import org.apache.spark.sql.types.YearMonthIntervalType._ import org.apache.spark.unsafe.types.UTF8String /** @@ -662,4 +663,14 @@ class CastSuite extends CastSuiteBase { checkEvaluation(cast(invalidInput, TimestampWithoutTZType), null) } } + + test("SPARK-35819: Support cast YearMonthIntervalType in different fields") { +val ym = cast(Literal.create("1-1"), YearMonthIntervalType(YEAR, MONTH)) +Seq(YearMonthIntervalType(YEAR, YEAR) -> 12, + YearMonthIntervalType(YEAR, MONTH) -> 13, + YearMonthIntervalType(MONTH, MONTH) -> 13) + .foreach { case (dt, value) => +checkEvaluation(cast(ym, dt), value) + } + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (aab37ed -> a39f1ea)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from aab37ed [SPARK-35593][K8S][TESTS][FOLLOWUP] Run KubernetesLocalDiskShuffleDataIOSuite on a dedicated JVM add a39f1ea [SPARK-35824][CORE][TESTS] Convert LevelDBSuite.IntKeyType from a nested class to a normal class No new revisions were added by this update. Summary of changes: .../apache/spark/util/kvstore/DBIteratorSuite.java | 2 +- .../{ArrayKeyIndexType.java => IntKeyType.java}| 19 --- .../apache/spark/util/kvstore/LevelDBSuite.java| 27 -- 3 files changed, 12 insertions(+), 36 deletions(-) copy common/kvstore/src/test/java/org/apache/spark/util/kvstore/{ArrayKeyIndexType.java => IntKeyType.java} (75%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (74d647d -> aab37ed)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 74d647d [SPARK-35825][INFRA] Increase the heap and stack size for Maven build add aab37ed [SPARK-35593][K8S][TESTS][FOLLOWUP] Run KubernetesLocalDiskShuffleDataIOSuite on a dedicated JVM No new revisions were added by this update. Summary of changes: project/SparkBuild.scala | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35825][INFRA] Increase the heap and stack size for Maven build
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 74d647d [SPARK-35825][INFRA] Increase the heap and stack size for Maven build 74d647d is described below commit 74d647d2ca6b0471f0eb90a59bccb1ecc0a9cc8f Author: Gengliang Wang AuthorDate: Sat Jun 19 10:44:46 2021 -0700 [SPARK-35825][INFRA] Increase the heap and stack size for Maven build ### What changes were proposed in this pull request? Increase memory configuration for Maven build. Stack size: 64MB => 128MB Initial heap size: 1024MB => 2048MB Maximum heap size: 1024MB => 2048MB The SBT builds are ok so let's keep the current configuration. ### Why are the changes needed? The jenkins jobs are unstable due to the stackoverflow errors: https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-3.2-jdk-11/ https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7/2274/ ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Jenkins test Closes #32961 from gengliangwang/increaseXss. Authored-by: Gengliang Wang Signed-off-by: Dongjoon Hyun --- pom.xml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/pom.xml b/pom.xml index e76c5f9..8bcb3a8 100644 --- a/pom.xml +++ b/pom.xml @@ -2604,9 +2604,9 @@ -P:silencer:globalFilters=.*deprecated.* - -Xss64m - -Xms1024m - -Xmx1024m + -Xss128m + -Xms2048m + -Xmx2048m -XX:ReservedCodeCacheSize=${CodeCacheSize} - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35726][SQL] Truncate java.time.Duration by fields of day-time interval type
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 2ebad72 [SPARK-35726][SQL] Truncate java.time.Duration by fields of day-time interval type 2ebad72 is described below commit 2ebad727587e25b8bf4a8439593e7402ea4e2827 Author: Angerszh AuthorDate: Sat Jun 19 13:51:21 2021 +0300 [SPARK-35726][SQL] Truncate java.time.Duration by fields of day-time interval type ### What changes were proposed in this pull request? Support truncate java.time.Duration by fields of day-time interval type. ### Why are the changes needed? To respect fields of the target day-time interval types. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added UT Closes #32950 from AngersZh/SPARK-35726. Authored-by: Angerszh Signed-off-by: Max Gekk --- .../spark/sql/catalyst/CatalystTypeConverters.scala | 11 ++- .../spark/sql/catalyst/util/IntervalUtils.scala | 15 +-- .../org/apache/spark/sql/RandomDataGenerator.scala | 13 - .../sql/catalyst/CatalystTypeConvertersSuite.scala | 20 4 files changed, 51 insertions(+), 8 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala index 08a5fd5..1742524 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala @@ -32,6 +32,7 @@ import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.catalyst.util._ import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.types._ +import org.apache.spark.sql.types.DayTimeIntervalType._ import org.apache.spark.sql.types.YearMonthIntervalType._ import org.apache.spark.unsafe.types.UTF8String @@ -76,8 +77,7 @@ object CatalystTypeConverters { case LongType => LongConverter case FloatType => FloatConverter case DoubleType => DoubleConverter - // TODO(SPARK-35726): Truncate java.time.Duration by fields of day-time interval type - case _: DayTimeIntervalType => DurationConverter + case DayTimeIntervalType(_, endField) => DurationConverter(endField) case YearMonthIntervalType(_, endField) => PeriodConverter(endField) case dataType: DataType => IdentityConverter(dataType) } @@ -432,9 +432,10 @@ object CatalystTypeConverters { override def toScalaImpl(row: InternalRow, column: Int): Double = row.getDouble(column) } - private object DurationConverter extends CatalystTypeConverter[Duration, Duration, Any] { + private case class DurationConverter(endField: Byte) + extends CatalystTypeConverter[Duration, Duration, Any] { override def toCatalystImpl(scalaValue: Duration): Long = { - IntervalUtils.durationToMicros(scalaValue) + IntervalUtils.durationToMicros(scalaValue, endField) } override def toScala(catalystValue: Any): Duration = { if (catalystValue == null) null @@ -523,7 +524,7 @@ object CatalystTypeConverters { map, (key: Any) => convertToCatalyst(key), (value: Any) => convertToCatalyst(value)) -case d: Duration => DurationConverter.toCatalyst(d) +case d: Duration => DurationConverter(SECOND).toCatalyst(d) case p: Period => PeriodConverter(MONTH).toCatalyst(p) case other => other } diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala index e87ea51..3f56d2f 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala @@ -892,18 +892,29 @@ object IntervalUtils { * @throws ArithmeticException If numeric overflow occurs */ def durationToMicros(duration: Duration): Long = { +durationToMicros(duration, DayTimeIntervalType.SECOND) + } + + def durationToMicros(duration: Duration, endField: Byte): Long = { val seconds = duration.getSeconds -if (seconds == minDurationSeconds) { +val micros = if (seconds == minDurationSeconds) { val microsInSeconds = (minDurationSeconds + 1) * MICROS_PER_SECOND val nanoAdjustment = duration.getNano assert(0 <= nanoAdjustment && nanoAdjustment < NANOS_PER_SECOND, "Duration.getNano() must return the adjustment to the seconds field " + -"in the range from 0 to 9 nanoseconds, inclusive.") + "in the range from 0 to 9 nanoseconds, inclusive.")
[spark] branch master updated: [SPARK-35818][BUILD] Upgrade SBT to 1.5.4
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 94f7015 [SPARK-35818][BUILD] Upgrade SBT to 1.5.4 94f7015 is described below commit 94f701587de7cd2ebc8c9b6de56e669345538112 Author: Dongjoon Hyun AuthorDate: Sat Jun 19 00:17:35 2021 -0700 [SPARK-35818][BUILD] Upgrade SBT to 1.5.4 ### What changes were proposed in this pull request? This PR aims to upgrade SBT to 1.5.4. ### Why are the changes needed? SBT 1.5.4 is released 5 days ago. - https://github.com/sbt/sbt/releases/tag/v1.5.4 This will bring the latest bug fixes like the following. - Fixes BSP on ARM Macs by keeping JNI server socket to keep using JNI - Fixes compiler ClassLoader list to use compilerJars.toList (For Scala 3, this drops support for 3.0.0-M2) - Fixes undercompilation of package object causing "Symbol 'type X' is missing from the classpath" - Fixes overcompilation with scalac -release flag - Fixes build/exit notification not closing BSP channel - Fixes POM file's Maven repository ID character restriction to match that of Maven ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. Closes #32966 from dongjoon-hyun/SPARK-35818. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- project/build.properties | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/project/build.properties b/project/build.properties index bef3260..78a23cc 100644 --- a/project/build.properties +++ b/project/build.properties @@ -14,4 +14,4 @@ # See the License for the specific language governing permissions and # limitations under the License. # -sbt.version=1.5.3 +sbt.version=1.5.4 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35593][K8S][TESTS][FOLLOWUP] Increase timeout in KubernetesLocalDiskShuffleDataIOSuite
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new b9d6473 [SPARK-35593][K8S][TESTS][FOLLOWUP] Increase timeout in KubernetesLocalDiskShuffleDataIOSuite b9d6473 is described below commit b9d6473e898cea255bbbc27f657e2958fd4c011b Author: Dongjoon Hyun AuthorDate: Sat Jun 19 15:22:29 2021 +0900 [SPARK-35593][K8S][TESTS][FOLLOWUP] Increase timeout in KubernetesLocalDiskShuffleDataIOSuite ### What changes were proposed in this pull request? This increases the timeout from 10 seconds to 60 seconds in KubernetesLocalDiskShuffleDataIOSuite to reduce the flakiness. ### Why are the changes needed? - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140003/testReport/ ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs Closes #32967 from dongjoon-hyun/SPARK-35593-2. Authored-by: Dongjoon Hyun Signed-off-by: Kousuke Saruta --- .../apache/spark/shuffle/KubernetesLocalDiskShuffleDataIOSuite.scala| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/shuffle/KubernetesLocalDiskShuffleDataIOSuite.scala b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/shuffle/KubernetesLocalDiskShuffleDataIOSuite.scala index e94e8dd..eca38a8 100644 --- a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/shuffle/KubernetesLocalDiskShuffleDataIOSuite.scala +++ b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/shuffle/KubernetesLocalDiskShuffleDataIOSuite.scala @@ -210,7 +210,7 @@ class KubernetesLocalDiskShuffleDataIOSuite extends SparkFunSuite with LocalSpar assert(master.shuffleStatuses(1).mapStatuses.forall(_ == null)) } sc.parallelize(Seq((1, 1)), 2).groupByKey().collect() - eventually(timeout(10.second), interval(1.seconds)) { + eventually(timeout(60.second), interval(1.seconds)) { assert(master.shuffleStatuses(0).mapStatuses.map(_.mapId).toSet == Set(0, 1, 2)) assert(master.shuffleStatuses(1).mapStatuses.map(_.mapId).toSet == Set(6, 7, 8)) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org