[jira] [Issue Comment Deleted] (SPARK-37105) Pass all UTs in `sql/hive` with Java 17
[ https://issues.apache.org/jira/browse/SPARK-37105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-37105: - Comment: was deleted (was: Add `extraJavaTestArgs` to sql/hive pom.xml and run {code:java} build/mvn clean install -Phadoop-3.2 -Phive-2.3 -Phive -pl sql/hive {code} there are 22 failed tests {code:java} Run completed in 1 hour, 2 minutes, 3 seconds. Total number of tests run: 3547 Suites: completed 117, aborted 0 Tests: succeeded 3525, failed 22, canceled 6, ignored 605, pending 0 *** 22 TESTS FAILED *** {code}) > Pass all UTs in `sql/hive` with Java 17 > --- > > Key: SPARK-37105 > URL: https://issues.apache.org/jira/browse/SPARK-37105 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Major > > Add `extraJavaTestArgs` to sql/hive pom.xml and run > {code:java} > build/mvn clean install -Phadoop-3.2 -Phive-2.3 -Phive -pl sql/hive > {code} > there are 22 failed tests > {code:java} > Run completed in 1 hour, 2 minutes, 3 seconds. > Total number of tests run: 3547 > Suites: completed 117, aborted 0 > Tests: succeeded 3525, failed 22, canceled 6, ignored 605, pending 0 > *** 22 TESTS FAILED *** > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37105) Pass all UTs in `sql/hive` with Java 17
[ https://issues.apache.org/jira/browse/SPARK-37105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-37105: - Description: Add `extraJavaTestArgs` to sql/hive pom.xml and run {code:java} build/mvn clean install -Phadoop-3.2 -Phive-2.3 -Phive -pl sql/hive {code} there are 22 failed tests {code:java} Run completed in 1 hour, 2 minutes, 3 seconds. Total number of tests run: 3547 Suites: completed 117, aborted 0 Tests: succeeded 3525, failed 22, canceled 6, ignored 605, pending 0 *** 22 TESTS FAILED *** {code} > Pass all UTs in `sql/hive` with Java 17 > --- > > Key: SPARK-37105 > URL: https://issues.apache.org/jira/browse/SPARK-37105 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Major > > Add `extraJavaTestArgs` to sql/hive pom.xml and run > {code:java} > build/mvn clean install -Phadoop-3.2 -Phive-2.3 -Phive -pl sql/hive > {code} > there are 22 failed tests > {code:java} > Run completed in 1 hour, 2 minutes, 3 seconds. > Total number of tests run: 3547 > Suites: completed 117, aborted 0 > Tests: succeeded 3525, failed 22, canceled 6, ignored 605, pending 0 > *** 22 TESTS FAILED *** > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37105) Pass all UTs in `sql/hive` with Java 17
[ https://issues.apache.org/jira/browse/SPARK-37105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17433562#comment-17433562 ] Yang Jie commented on SPARK-37105: -- Add `extraJavaTestArgs` to sql/hive pom.xml and run {code:java} build/mvn clean install -Phadoop-3.2 -Phive-2.3 -Phive -pl sql/hive {code} there are 22 failed tests {code:java} Run completed in 1 hour, 2 minutes, 3 seconds. Total number of tests run: 3547 Suites: completed 117, aborted 0 Tests: succeeded 3525, failed 22, canceled 6, ignored 605, pending 0 *** 22 TESTS FAILED *** {code} > Pass all UTs in `sql/hive` with Java 17 > --- > > Key: SPARK-37105 > URL: https://issues.apache.org/jira/browse/SPARK-37105 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37101) In class ShuffleBlockPusher, use config instead of key
[ https://issues.apache.org/jira/browse/SPARK-37101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-37101. -- Fix Version/s: 3.3.0 Assignee: jinhai Resolution: Fixed Fixed in https://github.com/apache/spark/pull/34372 > In class ShuffleBlockPusher, use config instead of key > -- > > Key: SPARK-37101 > URL: https://issues.apache.org/jira/browse/SPARK-37101 > Project: Spark > Issue Type: Improvement > Components: Shuffle >Affects Versions: 3.3.0 >Reporter: jinhai >Assignee: jinhai >Priority: Major > Fix For: 3.3.0 > > > In class ShuffleBlockPusher > {code:java} > // code placeholder > private[this] val maxBytesInFlight = > conf.getSizeAsMb("spark.reducer.maxSizeInFlight", "48m") * 1024 * 1024 > private[this] val maxReqsInFlight = > conf.getInt("spark.reducer.maxReqsInFlight", Int.MaxValue) > {code} > We can use config.REDUCER_MAX_SIZE_IN_FLIGHT and > config.REDUCER_MAX_REQS_IN_FLIGHT instead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37107) Inline type hints for files in python/pyspark/status.py
[ https://issues.apache.org/jira/browse/SPARK-37107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37107: Assignee: (was: Apache Spark) > Inline type hints for files in python/pyspark/status.py > --- > > Key: SPARK-37107 > URL: https://issues.apache.org/jira/browse/SPARK-37107 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37107) Inline type hints for files in python/pyspark/status.py
[ https://issues.apache.org/jira/browse/SPARK-37107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37107: Assignee: Apache Spark > Inline type hints for files in python/pyspark/status.py > --- > > Key: SPARK-37107 > URL: https://issues.apache.org/jira/browse/SPARK-37107 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37107) Inline type hints for files in python/pyspark/status.py
[ https://issues.apache.org/jira/browse/SPARK-37107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17433548#comment-17433548 ] Apache Spark commented on SPARK-37107: -- User 'dchvn' has created a pull request for this issue: https://github.com/apache/spark/pull/34375 > Inline type hints for files in python/pyspark/status.py > --- > > Key: SPARK-37107 > URL: https://issues.apache.org/jira/browse/SPARK-37107 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37107) Inline type hints for files in python/pyspark/status.py
dch nguyen created SPARK-37107: -- Summary: Inline type hints for files in python/pyspark/status.py Key: SPARK-37107 URL: https://issues.apache.org/jira/browse/SPARK-37107 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.3.0 Reporter: dch nguyen -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37105) Pass all UTs in `sql/hive` with Java 17
[ https://issues.apache.org/jira/browse/SPARK-37105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-37105: - Summary: Pass all UTs in `sql/hive` with Java 17 (was: Pass all UTs in `sql/core` with Java 17) > Pass all UTs in `sql/hive` with Java 17 > --- > > Key: SPARK-37105 > URL: https://issues.apache.org/jira/browse/SPARK-37105 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37105) Pass all UTs in `sql/core` with Java 17
Yang Jie created SPARK-37105: Summary: Pass all UTs in `sql/core` with Java 17 Key: SPARK-37105 URL: https://issues.apache.org/jira/browse/SPARK-37105 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.3.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37106) Pass all UTs in `yarn` with Java 17
Yang Jie created SPARK-37106: Summary: Pass all UTs in `yarn` with Java 17 Key: SPARK-37106 URL: https://issues.apache.org/jira/browse/SPARK-37106 Project: Spark Issue Type: Sub-task Components: YARN Affects Versions: 3.3.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36968) ps.Series.dot raise "matrices are not aligned" if index is not same
[ https://issues.apache.org/jira/browse/SPARK-36968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-36968: Assignee: dch nguyen > ps.Series.dot raise "matrices are not aligned" if index is not same > --- > > Key: SPARK-36968 > URL: https://issues.apache.org/jira/browse/SPARK-36968 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Assignee: dch nguyen >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36968) ps.Series.dot raise "matrices are not aligned" if index is not same
[ https://issues.apache.org/jira/browse/SPARK-36968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-36968. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34235 [https://github.com/apache/spark/pull/34235] > ps.Series.dot raise "matrices are not aligned" if index is not same > --- > > Key: SPARK-36968 > URL: https://issues.apache.org/jira/browse/SPARK-36968 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Assignee: dch nguyen >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37101) In class ShuffleBlockPusher, use config instead of key
[ https://issues.apache.org/jira/browse/SPARK-37101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-37101: Affects Version/s: (was: 3.2.0) 3.3.0 > In class ShuffleBlockPusher, use config instead of key > -- > > Key: SPARK-37101 > URL: https://issues.apache.org/jira/browse/SPARK-37101 > Project: Spark > Issue Type: Improvement > Components: Shuffle >Affects Versions: 3.3.0 >Reporter: jinhai >Priority: Major > > In class ShuffleBlockPusher > {code:java} > // code placeholder > private[this] val maxBytesInFlight = > conf.getSizeAsMb("spark.reducer.maxSizeInFlight", "48m") * 1024 * 1024 > private[this] val maxReqsInFlight = > conf.getInt("spark.reducer.maxReqsInFlight", Int.MaxValue) > {code} > We can use config.REDUCER_MAX_SIZE_IN_FLIGHT and > config.REDUCER_MAX_REQS_IN_FLIGHT instead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36057) Support volcano/alternative schedulers
[ https://issues.apache.org/jira/browse/SPARK-36057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17433527#comment-17433527 ] Jiaxin Shan commented on SPARK-36057: - Is it better to do this at the operator level? > Support volcano/alternative schedulers > -- > > Key: SPARK-36057 > URL: https://issues.apache.org/jira/browse/SPARK-36057 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.2.0 >Reporter: Holden Karau >Priority: Major > > This is an umbrella issue for tracking the work for supporting Volcano & > Yunikorn on Kubernetes. These schedulers provide more YARN like features > (such as queues and minimum resources before scheduling jobs) that many folks > want on Kubernetes. > > Yunikorn is an ASF project & Volcano is a CNCF project (sig-batch). > > They've taken slightly different approaches to solving the same problem, but > from Spark's point of view we should be able to share much of the code. > > See the initial brainstorming discussion in SPARK-35623. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37089) ParquetFileFormat registers task completion listeners lazily, causing Python writer thread to segfault when off-heap vectorized reader is enabled
[ https://issues.apache.org/jira/browse/SPARK-37089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-37089. -- Fix Version/s: 3.2.1 3.3.0 Resolution: Fixed Issue resolved by pull request 34369 [https://github.com/apache/spark/pull/34369] > ParquetFileFormat registers task completion listeners lazily, causing Python > writer thread to segfault when off-heap vectorized reader is enabled > - > > Key: SPARK-37089 > URL: https://issues.apache.org/jira/browse/SPARK-37089 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 3.0.3, 3.1.2, 3.2.0 >Reporter: Ankur Dave >Assignee: Ankur Dave >Priority: Major > Fix For: 3.3.0, 3.2.1 > > > The task completion listener that closes the vectorized reader is registered > lazily in ParquetFileFormat#buildReaderWithPartitionValues(). Since task > completion listeners are executed in reverse order of registration, it always > runs before the Python writer thread can be interrupted. > This contradicts the assumption in > https://issues.apache.org/jira/browse/SPARK-37088 / > https://github.com/apache/spark/pull/34245 that task completion listeners are > registered bottom-up, preventing that fix from working properly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37081) Upgrade the version of RDBMS and corresponding JDBC drivers used by docker-integration-tests
[ https://issues.apache.org/jira/browse/SPARK-37081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-37081. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34350 [https://github.com/apache/spark/pull/34350] > Upgrade the version of RDBMS and corresponding JDBC drivers used by > docker-integration-tests > > > Key: SPARK-37081 > URL: https://issues.apache.org/jira/browse/SPARK-37081 > Project: Spark > Issue Type: Improvement > Components: SQL, Tests >Affects Versions: 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > Fix For: 3.3.0 > > > Let's upgrade the version of RDBMS and corresponding JDBC drivers. > Especially, PostgreSQL 14 was released recently so it's great to ensure that > the JDBC source for PostgreSQL works with PostgreSQL 14. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37103) Switch from Maven to SBT to build Spark on AppVeyor
[ https://issues.apache.org/jira/browse/SPARK-37103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-37103. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34373 [https://github.com/apache/spark/pull/34373] > Switch from Maven to SBT to build Spark on AppVeyor > --- > > Key: SPARK-37103 > URL: https://issues.apache.org/jira/browse/SPARK-37103 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > Fix For: 3.3.0 > > > Recently, building Spark on AppVeyor almost always fails due to > StackOverflowError at compile time. > We can't identify the reason so far but one workaround would be building with > SBT. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37091) Support Java 17 in SparkR SystemRequirements
[ https://issues.apache.org/jira/browse/SPARK-37091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-37091: Assignee: Darek > Support Java 17 in SparkR SystemRequirements > > > Key: SPARK-37091 > URL: https://issues.apache.org/jira/browse/SPARK-37091 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Affects Versions: 3.3.0 >Reporter: Darek >Assignee: Darek >Priority: Trivial > Labels: newbie > Original Estimate: 1h > Remaining Estimate: 1h > > Please bump Java version to <= 17 in > [DESCRIPTION|https://github.com/apache/spark/blob/f9f95686cb397271f55aaff29ec4352b4ef9aade/R/pkg/DESCRIPTION] > Currently it is set to be: > {code:java} > SystemRequirements: Java (>= 8, < 12){code} > [PR|https://github.com/apache/spark/pull/34371] has been created for this > issue already. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37091) Support Java 17 in SparkR SystemRequirements
[ https://issues.apache.org/jira/browse/SPARK-37091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-37091. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34371 [https://github.com/apache/spark/pull/34371] > Support Java 17 in SparkR SystemRequirements > > > Key: SPARK-37091 > URL: https://issues.apache.org/jira/browse/SPARK-37091 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Affects Versions: 3.3.0 >Reporter: Darek >Assignee: Darek >Priority: Trivial > Labels: newbie > Fix For: 3.3.0 > > Original Estimate: 1h > Remaining Estimate: 1h > > Please bump Java version to <= 17 in > [DESCRIPTION|https://github.com/apache/spark/blob/f9f95686cb397271f55aaff29ec4352b4ef9aade/R/pkg/DESCRIPTION] > Currently it is set to be: > {code:java} > SystemRequirements: Java (>= 8, < 12){code} > [PR|https://github.com/apache/spark/pull/34371] has been created for this > issue already. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37037) Improve byte array sort by unify compareTo function of UTF8String and ByteArray
[ https://issues.apache.org/jira/browse/SPARK-37037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-37037: Assignee: XiDuo You > Improve byte array sort by unify compareTo function of UTF8String and > ByteArray > > > Key: SPARK-37037 > URL: https://issues.apache.org/jira/browse/SPARK-37037 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: XiDuo You >Assignee: XiDuo You >Priority: Minor > Fix For: 3.3.0 > > > BinaryType use `TypeUtils.compareBinary` to compare two byte array, however > it's slow since it compares byte array using unsigned int comparison byte by > bye. > We can compare them using `Platform.getLong` with unsigned long comparison if > they have more than 8 bytes. And here is some histroy about this `TODO` > [https://github.com/apache/spark/pull/6755/files#r32197461 > .|https://github.com/apache/spark/pull/6755/files#r32197461] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37037) Improve byte array sort by unify compareTo function of UTF8String and ByteArray
[ https://issues.apache.org/jira/browse/SPARK-37037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-37037. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34310 [https://github.com/apache/spark/pull/34310] > Improve byte array sort by unify compareTo function of UTF8String and > ByteArray > > > Key: SPARK-37037 > URL: https://issues.apache.org/jira/browse/SPARK-37037 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: XiDuo You >Priority: Minor > Fix For: 3.3.0 > > > BinaryType use `TypeUtils.compareBinary` to compare two byte array, however > it's slow since it compares byte array using unsigned int comparison byte by > bye. > We can compare them using `Platform.getLong` with unsigned long comparison if > they have more than 8 bytes. And here is some histroy about this `TODO` > [https://github.com/apache/spark/pull/6755/files#r32197461 > .|https://github.com/apache/spark/pull/6755/files#r32197461] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37037) Improve byte array sort by unify compareTo function of UTF8String and ByteArray
[ https://issues.apache.org/jira/browse/SPARK-37037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-37037: - Priority: Minor (was: Major) > Improve byte array sort by unify compareTo function of UTF8String and > ByteArray > > > Key: SPARK-37037 > URL: https://issues.apache.org/jira/browse/SPARK-37037 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: XiDuo You >Priority: Minor > > BinaryType use `TypeUtils.compareBinary` to compare two byte array, however > it's slow since it compares byte array using unsigned int comparison byte by > bye. > We can compare them using `Platform.getLong` with unsigned long comparison if > they have more than 8 bytes. And here is some histroy about this `TODO` > [https://github.com/apache/spark/pull/6755/files#r32197461 > .|https://github.com/apache/spark/pull/6755/files#r32197461] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37104) RDD and DStream should be covariant
[ https://issues.apache.org/jira/browse/SPARK-37104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37104: Assignee: Apache Spark > RDD and DStream should be covariant > --- > > Key: SPARK-37104 > URL: https://issues.apache.org/jira/browse/SPARK-37104 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.1.0, 3.2.0, 3.3.0 >Reporter: Maciej Szymkiewicz >Assignee: Apache Spark >Priority: Major > > At the moment {{RDD}} and {{DStream}} are defined as invariant. > > However, there are immutable and could be marked as covariant. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37104) RDD and DStream should be covariant
[ https://issues.apache.org/jira/browse/SPARK-37104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37104: Assignee: (was: Apache Spark) > RDD and DStream should be covariant > --- > > Key: SPARK-37104 > URL: https://issues.apache.org/jira/browse/SPARK-37104 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.1.0, 3.2.0, 3.3.0 >Reporter: Maciej Szymkiewicz >Priority: Major > > At the moment {{RDD}} and {{DStream}} are defined as invariant. > > However, there are immutable and could be marked as covariant. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37104) RDD and DStream should be covariant
[ https://issues.apache.org/jira/browse/SPARK-37104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17433427#comment-17433427 ] Apache Spark commented on SPARK-37104: -- User 'zero323' has created a pull request for this issue: https://github.com/apache/spark/pull/34374 > RDD and DStream should be covariant > --- > > Key: SPARK-37104 > URL: https://issues.apache.org/jira/browse/SPARK-37104 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.1.0, 3.2.0, 3.3.0 >Reporter: Maciej Szymkiewicz >Priority: Major > > At the moment {{RDD}} and {{DStream}} are defined as invariant. > > However, there are immutable and could be marked as covariant. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37104) RDD and DStream should be covariant
Maciej Szymkiewicz created SPARK-37104: -- Summary: RDD and DStream should be covariant Key: SPARK-37104 URL: https://issues.apache.org/jira/browse/SPARK-37104 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.2.0, 3.1.0, 3.3.0 Reporter: Maciej Szymkiewicz At the moment {{RDD}} and {{DStream}} are defined as invariant. However, there are immutable and could be marked as covariant. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org