[jira] [Created] (SPARK-49310) Upgrade Apache Parquet to 1.14.2
Fokko Driesprong created SPARK-49310: Summary: Upgrade Apache Parquet to 1.14.2 Key: SPARK-49310 URL: https://issues.apache.org/jira/browse/SPARK-49310 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: Fokko Driesprong -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-49014) Bump Apache Avro to 1.12.0
Fokko Driesprong created SPARK-49014: Summary: Bump Apache Avro to 1.12.0 Key: SPARK-49014 URL: https://issues.apache.org/jira/browse/SPARK-49014 Project: Spark Issue Type: Improvement Components: Input/Output Affects Versions: 3.4.3 Reporter: Fokko Driesprong -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48177) Upgrade `Parquet` to 1.14.1
[ https://issues.apache.org/jira/browse/SPARK-48177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong updated SPARK-48177: - Summary: Upgrade `Parquet` to 1.14.1 (was: Upgrade `Parquet` to 1.14.0) > Upgrade `Parquet` to 1.14.1 > --- > > Key: SPARK-48177 > URL: https://issues.apache.org/jira/browse/SPARK-48177 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48177) Bump Parquet to 1.14.0
Fokko Driesprong created SPARK-48177: Summary: Bump Parquet to 1.14.0 Key: SPARK-48177 URL: https://issues.apache.org/jira/browse/SPARK-48177 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.5.2 Reporter: Fokko Driesprong Fix For: 4.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43594) Add LocalDateTime to anyToMicros
Fokko Driesprong created SPARK-43594: Summary: Add LocalDateTime to anyToMicros Key: SPARK-43594 URL: https://issues.apache.org/jira/browse/SPARK-43594 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.4.0 Reporter: Fokko Driesprong -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43425) Add TimestampNTZType to ColumnarBatchRow
[ https://issues.apache.org/jira/browse/SPARK-43425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong updated SPARK-43425: - Issue Type: Bug (was: Improvement) > Add TimestampNTZType to ColumnarBatchRow > > > Key: SPARK-43425 > URL: https://issues.apache.org/jira/browse/SPARK-43425 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43425) Add TimestampNTZType to ColumnarBatchRow
Fokko Driesprong created SPARK-43425: Summary: Add TimestampNTZType to ColumnarBatchRow Key: SPARK-43425 URL: https://issues.apache.org/jira/browse/SPARK-43425 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.4.0 Reporter: Fokko Driesprong Fix For: 3.4.1, 3.5.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32797) Install mypy on the Jenkins CI workers
[ https://issues.apache.org/jira/browse/SPARK-32797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong updated SPARK-32797: - Affects Version/s: (was: 3.0.0) 3.1.0 > Install mypy on the Jenkins CI workers > -- > > Key: SPARK-32797 > URL: https://issues.apache.org/jira/browse/SPARK-32797 > Project: Spark > Issue Type: Improvement > Components: jenkins, PySpark >Affects Versions: 3.1.0 >Reporter: Fokko Driesprong >Priority: Major > > We want to check the types of the PySpark code. This requires mypy to be > installed on the CI. Can you do this [~shaneknapp]? > Related PR: [https://github.com/apache/spark/pull/29180] > You can install this using pip: [https://pypi.org/project/mypy/] Should be > similar to flake8 and sphinx. The latest version is ok! Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32797) Install mypy on the Jenkins CI workers
[ https://issues.apache.org/jira/browse/SPARK-32797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong updated SPARK-32797: - Description: We want to check the types of the PySpark code. This requires mypy to be installed on the CI. Can you do this [~shaneknapp]? Related PR: [https://github.com/apache/spark/pull/29180] You can install this using pip: [https://pypi.org/project/mypy/] Should be similar to flake8 and sphinx. The latest version is ok! Thanks! > Install mypy on the Jenkins CI workers > -- > > Key: SPARK-32797 > URL: https://issues.apache.org/jira/browse/SPARK-32797 > Project: Spark > Issue Type: Improvement > Components: jenkins, PySpark >Affects Versions: 3.0.0 >Reporter: Fokko Driesprong >Priority: Major > > We want to check the types of the PySpark code. This requires mypy to be > installed on the CI. Can you do this [~shaneknapp]? > Related PR: [https://github.com/apache/spark/pull/29180] > You can install this using pip: [https://pypi.org/project/mypy/] Should be > similar to flake8 and sphinx. The latest version is ok! Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32797) Install mypy on the Jenkins CI workers
Fokko Driesprong created SPARK-32797: Summary: Install mypy on the Jenkins CI workers Key: SPARK-32797 URL: https://issues.apache.org/jira/browse/SPARK-32797 Project: Spark Issue Type: Improvement Components: jenkins, PySpark Affects Versions: 3.0.0 Reporter: Fokko Driesprong -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32770) Add missing imports
[ https://issues.apache.org/jira/browse/SPARK-32770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong resolved SPARK-32770. -- Resolution: Won't Fix > Add missing imports > --- > > Key: SPARK-32770 > URL: https://issues.apache.org/jira/browse/SPARK-32770 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 2.4.6, 3.0.0 >Reporter: Fokko Driesprong >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32770) Add missing imports
Fokko Driesprong created SPARK-32770: Summary: Add missing imports Key: SPARK-32770 URL: https://issues.apache.org/jira/browse/SPARK-32770 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.0.0, 2.4.6 Reporter: Fokko Driesprong -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32719) Add Flake8 check for missing imports
Fokko Driesprong created SPARK-32719: Summary: Add Flake8 check for missing imports Key: SPARK-32719 URL: https://issues.apache.org/jira/browse/SPARK-32719 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.0.0 Reporter: Fokko Driesprong Add Flake8 check to detect missing imports. While working on SPARK-17333 I've noticed that we're missing some imports. This PR will enable a check using Flake8. One of the side effects is that we can't use wildcard imports, since Flake8 is unable to figure them out. However, having wildcard imports isn't the best practice since it can be unclear from which wildcard import a specific class is coming from. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10520) Dates cannot be summarised
[ https://issues.apache.org/jira/browse/SPARK-10520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17182266#comment-17182266 ] Fokko Driesprong commented on SPARK-10520: -- Can this issue be assigned to my name? Normally this happens automatically when opening a PR. > Dates cannot be summarised > -- > > Key: SPARK-10520 > URL: https://issues.apache.org/jira/browse/SPARK-10520 > Project: Spark > Issue Type: Bug > Components: PySpark, SparkR, SQL >Affects Versions: 1.5.0 >Reporter: Vincent Warmerdam >Priority: Major > Labels: bulk-closed > > I create a simple dataframe in R and call the summary function on it > (standard R, not SparkR). > {code} > > library(magrittr) > > df <- data.frame( > date = as.Date("2015-01-01") + 0:99, > r = runif(100) > ) > > df %>% summary > date r > Min. :2015-01-01 Min. :0.01221 > 1st Qu.:2015-01-25 1st Qu.:0.30003 > Median :2015-02-19 Median :0.46416 > Mean :2015-02-19 Mean :0.50350 > 3rd Qu.:2015-03-16 3rd Qu.:0.73361 > Max. :2015-04-10 Max. :0.99618 > {code} > Notice that the date can be summarised here. In SparkR; this will give an > error. > {code} > > ddf <- createDataFrame(sqlContext, df) > > ddf %>% summary > Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) : > org.apache.spark.sql.AnalysisException: cannot resolve 'avg(date)' due to > data type mismatch: function average requires numeric types, not DateType; > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:61) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:53) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:293) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:293) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:292) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:290) > at org.apache.spark.sql. > {code} > This is a rather annoying bug since the SparkR documentation currently > suggests that dates are now supported in SparkR. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32572) Run all the tests at once, instead of having separate entrypoints.
[ https://issues.apache.org/jira/browse/SPARK-32572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong updated SPARK-32572: - Description: Started with this comment thread: https://github.com/apache/spark/pull/29121/files#r456683561 Each file is invoked separately and has a separate entry point: [https://github.com/apache/spark/blob/master/python/pyspark/ml/tests/test_wrapper.py#L120] We would replace [https://github.com/apache/spark/blob/master/dev/run-tests.py#L470] this function call to the subprocess with something that would invoke the python tests. > Run all the tests at once, instead of having separate entrypoints. > -- > > Key: SPARK-32572 > URL: https://issues.apache.org/jira/browse/SPARK-32572 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.0.0 >Reporter: Fokko Driesprong >Priority: Major > > Started with this comment thread: > https://github.com/apache/spark/pull/29121/files#r456683561 > Each file is invoked separately and has a separate entry point: > [https://github.com/apache/spark/blob/master/python/pyspark/ml/tests/test_wrapper.py#L120] > We would replace > [https://github.com/apache/spark/blob/master/dev/run-tests.py#L470] this > function call to the subprocess with something that would invoke the python > tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32572) Run all the tests at once, instead of having separate entrypoints.
Fokko Driesprong created SPARK-32572: Summary: Run all the tests at once, instead of having separate entrypoints. Key: SPARK-32572 URL: https://issues.apache.org/jira/browse/SPARK-32572 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.0.0 Reporter: Fokko Driesprong -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32319) Disallow the use of unused imports
[ https://issues.apache.org/jira/browse/SPARK-32319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong updated SPARK-32319: - Summary: Disallow the use of unused imports (was: Remove unused imports) > Disallow the use of unused imports > -- > > Key: SPARK-32319 > URL: https://issues.apache.org/jira/browse/SPARK-32319 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.0.0 >Reporter: Fokko Driesprong >Priority: Major > > We don't want to import stuff that we're not going to use, to reduce the > memory pressure. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32313) Remove Python 2 artifacts
[ https://issues.apache.org/jira/browse/SPARK-32313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong resolved SPARK-32313. -- Resolution: Won't Fix > Remove Python 2 artifacts > - > > Key: SPARK-32313 > URL: https://issues.apache.org/jira/browse/SPARK-32313 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.0.0 >Reporter: Fokko Driesprong >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32320) Remove mutable default arguments
Fokko Driesprong created SPARK-32320: Summary: Remove mutable default arguments Key: SPARK-32320 URL: https://issues.apache.org/jira/browse/SPARK-32320 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.0.0 Reporter: Fokko Driesprong -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32319) Remove unused imports
Fokko Driesprong created SPARK-32319: Summary: Remove unused imports Key: SPARK-32319 URL: https://issues.apache.org/jira/browse/SPARK-32319 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.0.0 Reporter: Fokko Driesprong We don't want to import stuff that we're not going to use, to reduce the memory pressure. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-27757) Bump Jackson to 2.9.9
[ https://issues.apache.org/jira/browse/SPARK-27757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong closed SPARK-27757. Master is on Jackson 2.10 now :) > Bump Jackson to 2.9.9 > - > > Key: SPARK-27757 > URL: https://issues.apache.org/jira/browse/SPARK-27757 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Minor > Fix For: 3.0.0 > > > This fixes CVE-2019-12086 on Databind: > https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.9.9 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17333) Make pyspark interface friendly with mypy static analysis
[ https://issues.apache.org/jira/browse/SPARK-17333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong updated SPARK-17333: - Summary: Make pyspark interface friendly with mypy static analysis (was: Make pyspark interface friendly with static analysis) > Make pyspark interface friendly with mypy static analysis > - > > Key: SPARK-17333 > URL: https://issues.apache.org/jira/browse/SPARK-17333 > Project: Spark > Issue Type: Improvement > Components: PySpark >Reporter: Assaf Mendelson >Priority: Trivial > > Static analysis tools such as those common to IDE for auto completion and > error marking, tend to have poor results with pyspark. > This is cause by two separate issues: > The first is that many elements are created programmatically such as the max > function in pyspark.sql.functions. > The second is that we tend to use pyspark in a functional manner, meaning > that we chain many actions (e.g. df.filter().groupby().agg()) and since > python has no type information this can become difficult to understand. > I would suggest changing the interface to improve it. > The way I see it we can either change the interface or provide interface > enhancements. > Changing the interface means defining (when possible) all functions directly, > i.e. instead of having a __functions__ dictionary in pyspark.sql.functions.py > and then generating the functions programmatically by using _create_function, > create the function directly. > def max(col): >""" >docstring >""" >_create_function(max,"docstring") > Second we can add type indications to all functions as defined in pep 484 or > pycharm's legacy type hinting > (https://www.jetbrains.com/help/pycharm/2016.1/type-hinting-in-pycharm.html#legacy). > So for example max might look like this: > def max(col): >""" >does a max. > :type col: Column > :rtype Column >""" > This would provide a wide range of support as these types of hints, while old > are pretty common. > A second option is to use PEP 3107 to define interfaces (pyi files) > in this case we might have a functions.pyi file which would contain something > like: > def max(col: Column) -> Column: > """ > Aggregate function: returns the maximum value of the expression in a > group. > """ > ... > This has the advantage of easier to understand types and not touching the > code (only supported code) but has the disadvantage of being separately > managed (i.e. greater chance of doing a mistake) and the fact that some > configuration would be needed in the IDE/static analysis tool instead of > working out of the box. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32313) Remove Python 2 artifacts
Fokko Driesprong created SPARK-32313: Summary: Remove Python 2 artifacts Key: SPARK-32313 URL: https://issues.apache.org/jira/browse/SPARK-32313 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.0.0 Reporter: Fokko Driesprong -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32311) Remove duplicate import
Fokko Driesprong created SPARK-32311: Summary: Remove duplicate import Key: SPARK-32311 URL: https://issues.apache.org/jira/browse/SPARK-32311 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.0.0 Reporter: Fokko Driesprong -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32309) Fix missing import
Fokko Driesprong created SPARK-32309: Summary: Fix missing import Key: SPARK-32309 URL: https://issues.apache.org/jira/browse/SPARK-32309 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.0.0 Reporter: Fokko Driesprong -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31981) Keep TimestampType when taking an average of a Timestamp
Fokko Driesprong created SPARK-31981: Summary: Keep TimestampType when taking an average of a Timestamp Key: SPARK-31981 URL: https://issues.apache.org/jira/browse/SPARK-31981 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Fokko Driesprong Fix For: 3.1.0 Currently, when you take an average of a Timestamp, you'll end up with a Double, representing the seconds since epoch. This is because of old Hive behavior. I strongly believe that it is better to return a Timestamp. root@8c4241b617ec:/# psql postgres postgres psql (12.3 (Debian 12.3-1.pgdg100+1)) Type "help" for help. postgres=# CREATE TABLE timestamp_demo (ts TIMESTAMP); CREATE TABLE postgres=# INSERT INTO timestamp_demo VALUES('2019-01-01 18:22:11'); INSERT 0 1 postgres=# INSERT INTO timestamp_demo VALUES('2018-01-01 18:22:11'); INSERT 0 1 postgres=# INSERT INTO timestamp_demo VALUES('2017-01-01 18:22:11'); INSERT 0 1 postgres=# SELECT AVG(ts) FROM timestamp_demo; ERROR: function avg(timestamp without time zone) does not exist LINE 1: SELECT AVG(ts) FROM timestamp_demo; root@bab43a5731e8:/# mysql Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 9 Server version: 8.0.20 MySQL Community Server - GPL Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> CREATE TABLE timestamp_demo (ts TIMESTAMP); Query OK, 0 rows affected (0.05 sec) mysql> INSERT INTO timestamp_demo VALUES('2019-01-01 18:22:11'); Query OK, 1 row affected (0.01 sec) mysql> INSERT INTO timestamp_demo VALUES('2018-01-01 18:22:11'); Query OK, 1 row affected (0.01 sec) mysql> INSERT INTO timestamp_demo VALUES('2017-01-01 18:22:11'); Query OK, 1 row affected (0.01 sec) mysql> SELECT AVG(ts) FROM timestamp_demo; +-+ | AVG(ts) | +-+ | 20180101182211. | +-+ 1 row in set (0.00 sec) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-10520) Dates cannot be summarised
[ https://issues.apache.org/jira/browse/SPARK-10520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong reopened SPARK-10520: -- This is the underlying issue of [https://github.com/apache/spark/pull/28554] Let me check if I can come up with a fix. > Dates cannot be summarised > -- > > Key: SPARK-10520 > URL: https://issues.apache.org/jira/browse/SPARK-10520 > Project: Spark > Issue Type: Bug > Components: PySpark, SparkR, SQL >Affects Versions: 1.5.0 >Reporter: Vincent Warmerdam >Priority: Major > Labels: bulk-closed > > I create a simple dataframe in R and call the summary function on it > (standard R, not SparkR). > {code} > > library(magrittr) > > df <- data.frame( > date = as.Date("2015-01-01") + 0:99, > r = runif(100) > ) > > df %>% summary > date r > Min. :2015-01-01 Min. :0.01221 > 1st Qu.:2015-01-25 1st Qu.:0.30003 > Median :2015-02-19 Median :0.46416 > Mean :2015-02-19 Mean :0.50350 > 3rd Qu.:2015-03-16 3rd Qu.:0.73361 > Max. :2015-04-10 Max. :0.99618 > {code} > Notice that the date can be summarised here. In SparkR; this will give an > error. > {code} > > ddf <- createDataFrame(sqlContext, df) > > ddf %>% summary > Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) : > org.apache.spark.sql.AnalysisException: cannot resolve 'avg(date)' due to > data type mismatch: function average requires numeric types, not DateType; > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:61) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:53) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:293) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:293) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:292) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:290) > at org.apache.spark.sql. > {code} > This is a rather annoying bug since the SparkR documentation currently > suggests that dates are now supported in SparkR. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31735) Include all columns in the summary report
Fokko Driesprong created SPARK-31735: Summary: Include all columns in the summary report Key: SPARK-31735 URL: https://issues.apache.org/jira/browse/SPARK-31735 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Affects Versions: 2.4.5 Reporter: Fokko Driesprong Dates and other columns are excluded: {{from datetime import datetime, timedelta, timezone}} {{from pyspark.sql import types as T}} {{from pyspark.sql import Row}} {{from pyspark.sql import functions as F}}{{START = datetime(2014, 1, 1, tzinfo=timezone.utc)}}{{n_days = 22}}{{date_range = [Row(date=(START + timedelta(days=n))) for n in range(0, n_days)]}}{{schema = T.StructType([T.StructField(name="date", dataType=T.DateType(), nullable=False)])}} {{rdd = spark.sparkContext.parallelize(date_range)}}{{df = spark.createDataFrame(data=rdd, schema=schema)}} {{df.agg(F.max("date")).show()}}{{df.summary().show()}} {{+---+}} {{|summary|}} {{+---+}} {{| count |}} {{| mean |}} {{| stddev|}} {{| min |}} {{| 25% |}} {{| 50% |}} {{| 75% |}} {{| max |}} {{+---+}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29245) CCE during creating HiveMetaStoreClient
[ https://issues.apache.org/jira/browse/SPARK-29245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17081174#comment-17081174 ] Fokko Driesprong commented on SPARK-29245: -- Thanks, on Iceberg we have a similar issue: [https://github.com/apache/incubator-iceberg/pull/577] For reference. > CCE during creating HiveMetaStoreClient > > > Key: SPARK-29245 > URL: https://issues.apache.org/jira/browse/SPARK-29245 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > From `master` branch build, when I try to connect to an external HMS, I hit > the following. > {code} > 19/09/25 10:58:46 ERROR hive.log: Got exception: java.lang.ClassCastException > class [Ljava.lang.Object; cannot be cast to class [Ljava.net.URI; > ([Ljava.lang.Object; and [Ljava.net.URI; are in module java.base of loader > 'bootstrap') > java.lang.ClassCastException: class [Ljava.lang.Object; cannot be cast to > class [Ljava.net.URI; ([Ljava.lang.Object; and [Ljava.net.URI; are in module > java.base of loader 'bootstrap') > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:200) > at > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:70) > {code} > With HIVE-21508, I can get the following. > {code} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.0.0-SNAPSHOT > /_/ > Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 11.0.4) > Type in expressions to have them evaluated. > Type :help for more information. > scala> sql("show databases").show > ++ > |databaseName| > ++ > | . | > ... > {code} > With 2.3.7-SNAPSHOT, the following basic tests are tested. > - SHOW DATABASES / TABLES > - DESC DATABASE / TABLE > - CREATE / DROP / USE DATABASE > - CREATE / DROP / INSERT / LOAD / SELECT TABLE -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-30103) Remove duplicate schema merge logic
[ https://issues.apache.org/jira/browse/SPARK-30103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong resolved SPARK-30103. -- Resolution: Won't Fix > Remove duplicate schema merge logic > --- > > Key: SPARK-30103 > URL: https://issues.apache.org/jira/browse/SPARK-30103 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Fokko Driesprong >Priority: Major > > There is duplicate logic of merging two schema's. First one in > StructType.merge() and secondly in catalyst using > TypeCoercion.findTightestCommonType(). My suggestion is to remove the first > one. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27506) Function `from_avro` doesn't allow deserialization of data using other compatible schemas
[ https://issues.apache.org/jira/browse/SPARK-27506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong updated SPARK-27506: - Fix Version/s: 3.0.0 > Function `from_avro` doesn't allow deserialization of data using other > compatible schemas > - > > Key: SPARK-27506 > URL: https://issues.apache.org/jira/browse/SPARK-27506 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gianluca Amori >Assignee: Fokko Driesprong >Priority: Major > Fix For: 3.0.0 > > > SPARK-24768 and subtasks introduced support to read and write Avro data by > parsing a binary column of Avro format and converting it into its > corresponding catalyst value (and viceversa). > > The current implementation has the limitation of requiring deserialization of > an event with the exact same schema with which it was serialized. This breaks > one of the most important features of Avro, schema evolution > [https://docs.confluent.io/current/schema-registry/avro.html] - most > importantly, the ability to read old data with a newer (compatible) schema > without breaking the consumer. > > The GenericDatumReader in the Avro library already supports passing an > optional *writer's schema* (the schema with which the record was serialized) > alongside a mandatory *reader's schema* (the schema with which the record is > going to be deserialized). The proposed change is to do the same in the > from_avro function, allowing the possibility to pass an optional writer's > schema to be used in the deserialization. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30103) Remove duplicate schema merge logic
Fokko Driesprong created SPARK-30103: Summary: Remove duplicate schema merge logic Key: SPARK-30103 URL: https://issues.apache.org/jira/browse/SPARK-30103 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.4 Reporter: Fokko Driesprong Fix For: 3.0.0 There is duplicate logic of merging two schema's. First one in StructType.merge() and secondly in catalyst using TypeCoercion.findTightestCommonType(). My suggestion is to remove the first one. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30004) Allow UserDefinedType to be merged into a standard DateType
[ https://issues.apache.org/jira/browse/SPARK-30004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong updated SPARK-30004: - Component/s: (was: Spark Core) SQL > Allow UserDefinedType to be merged into a standard DateType > --- > > Key: SPARK-30004 > URL: https://issues.apache.org/jira/browse/SPARK-30004 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.4 >Reporter: Fokko Driesprong >Priority: Major > Fix For: 2.4.5 > > > I've registered a custom type, namely XMLGregorianCalendar which is being > used by Scalaxb. A XML databinding tool, for generating case classes based on > a XSD. I want to convert the XMLGregorianCalendar to a regular TimestampType. > This works, but when I update the table (using Delta), I get an error: > Failed to merge fields 'START_DATE_MAINTENANCE_FLPL' and > 'START_DATE_MAINTENANCE_FLPL'. > Failed to merge incompatible data types TimestampType and > org.apache.spark.sql.types.CustomXMLGregorianCalendarType@5ff12345;; > There are two ways of fixing this: > * Adding a rule which compares the sqlType. > * Change the compare function, so it will check the rhs to the lhs, so I can > override the equals function on the UserDefinedType -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30004) Allow UserDefinedType to be merged into a standard DateType
Fokko Driesprong created SPARK-30004: Summary: Allow UserDefinedType to be merged into a standard DateType Key: SPARK-30004 URL: https://issues.apache.org/jira/browse/SPARK-30004 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 2.4.4 Reporter: Fokko Driesprong Fix For: 2.4.5 I've registered a custom type, namely XMLGregorianCalendar which is being used by Scalaxb. A XML databinding tool, for generating case classes based on a XSD. I want to convert the XMLGregorianCalendar to a regular TimestampType. This works, but when I update the table (using Delta), I get an error: Failed to merge fields 'START_DATE_MAINTENANCE_FLPL' and 'START_DATE_MAINTENANCE_FLPL'. Failed to merge incompatible data types TimestampType and org.apache.spark.sql.types.CustomXMLGregorianCalendarType@5ff12345;; There are two ways of fixing this: * Adding a rule which compares the sqlType. * Change the compare function, so it will check the rhs to the lhs, so I can override the equals function on the UserDefinedType -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29483) Bump Jackson to 2.10.0
Fokko Driesprong created SPARK-29483: Summary: Bump Jackson to 2.10.0 Key: SPARK-29483 URL: https://issues.apache.org/jira/browse/SPARK-29483 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 2.4.4 Reporter: Fokko Driesprong Fix For: 3.0.0 Fixes the following CVE's: https://www.cvedetails.com/cve/CVE-2019-16942/ https://www.cvedetails.com/cve/CVE-2019-16943/ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29445) Bump netty-all from 4.1.39.Final to 4.1.42.Final
Fokko Driesprong created SPARK-29445: Summary: Bump netty-all from 4.1.39.Final to 4.1.42.Final Key: SPARK-29445 URL: https://issues.apache.org/jira/browse/SPARK-29445 Project: Spark Issue Type: Task Components: Spark Core Affects Versions: 2.4.4 Reporter: Fokko Driesprong https://www.cvedetails.com/cve/CVE-2019-16869/ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28921) Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10, 1.12.10, 1.11.10)
[ https://issues.apache.org/jira/browse/SPARK-28921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926122#comment-16926122 ] Fokko Driesprong commented on SPARK-28921: -- I can confirm that we're running into the same issue with an on-premise k8s cluster with RBAC enabled. After updating the kubernetes client to 4.4.2 everything works fine again. > Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10, > 1.12.10, 1.11.10) > --- > > Key: SPARK-28921 > URL: https://issues.apache.org/jira/browse/SPARK-28921 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.3.0, 2.3.1, 2.3.3, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4 >Reporter: Paul Schweigert >Assignee: Andy Grove >Priority: Major > Fix For: 2.4.5, 3.0.0 > > > Spark jobs are failing on latest versions of Kubernetes when jobs attempt to > provision executor pods (jobs like Spark-Pi that do not launch executors run > without a problem): > > Here's an example error message: > > {code:java} > 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors > from Kubernetes. > 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors > from Kubernetes.19/08/30 01:29:09 WARN WatchConnectionManager: Exec Failure: > HTTP 403, Status: 403 - > java.net.ProtocolException: Expected HTTP 101 response but was '403 > Forbidden' > at > okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:216) > at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:183) > at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141) > at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > > at java.lang.Thread.run(Thread.java:748) > {code} > > Looks like the issue is caused by fixes for a recent CVE : > CVE: [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-14809] > Fix: [https://github.com/fabric8io/kubernetes-client/pull/1669] > > Looks like upgrading kubernetes-client to 4.4.2 would solve this issue. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27733) Upgrade to Avro 1.9.x
[ https://issues.apache.org/jira/browse/SPARK-27733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920685#comment-16920685 ] Fokko Driesprong commented on SPARK-27733: -- The regression issue has been resolved with the freshly released Avro 1.9.1. I'll look into the issues with the Hive dependency. > Upgrade to Avro 1.9.x > - > > Key: SPARK-27733 > URL: https://issues.apache.org/jira/browse/SPARK-27733 > Project: Spark > Issue Type: Improvement > Components: Build, SQL >Affects Versions: 3.0.0 >Reporter: Ismaël Mejía >Priority: Minor > > Avro 1.9.0 was released with many nice features including reduced size (1MB > less), and removed dependencies, no paranmer, no shaded guava, security > updates, so probably a worth upgrade. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28728) Bump Jackson Databind to 2.9.9.3
[ https://issues.apache.org/jira/browse/SPARK-28728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong updated SPARK-28728: - Description: (was: Due to CVE's: https://www.cvedetails.com/vulnerability-list/vendor_id-15866/product_id-42991/version_id-238179/opec-1/Fasterxml-Jackson-databind-2.9.0.html) > Bump Jackson Databind to 2.9.9.3 > > > Key: SPARK-28728 > URL: https://issues.apache.org/jira/browse/SPARK-28728 > Project: Spark > Issue Type: Task > Components: Spark Core >Affects Versions: 2.4.3 >Reporter: Fokko Driesprong >Priority: Major > Fix For: 2.4.4, 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28728) Bump Jackson Databind to 2.9.9.3
[ https://issues.apache.org/jira/browse/SPARK-28728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong updated SPARK-28728: - Description: Needs to be upgraded due to issues. > Bump Jackson Databind to 2.9.9.3 > > > Key: SPARK-28728 > URL: https://issues.apache.org/jira/browse/SPARK-28728 > Project: Spark > Issue Type: Task > Components: Spark Core >Affects Versions: 2.4.3 >Reporter: Fokko Driesprong >Priority: Major > Fix For: 2.4.4, 3.0.0 > > > Needs to be upgraded due to issues. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28728) Bump Jackson Databind to 2.9.9.3
Fokko Driesprong created SPARK-28728: Summary: Bump Jackson Databind to 2.9.9.3 Key: SPARK-28728 URL: https://issues.apache.org/jira/browse/SPARK-28728 Project: Spark Issue Type: Task Components: Spark Core Affects Versions: 2.4.3 Reporter: Fokko Driesprong Fix For: 2.4.4, 3.0.0 Due to CVE's: https://www.cvedetails.com/vulnerability-list/vendor_id-15866/product_id-42991/version_id-238179/opec-1/Fasterxml-Jackson-databind-2.9.0.html -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28713) Bump checkstyle from 8.14 to 8.18
Fokko Driesprong created SPARK-28713: Summary: Bump checkstyle from 8.14 to 8.18 Key: SPARK-28713 URL: https://issues.apache.org/jira/browse/SPARK-28713 Project: Spark Issue Type: Task Components: Spark Core Affects Versions: 2.4.3 Reporter: Fokko Driesprong >From the GitHub Security Advisory Database: Moderate severity vulnerability that affects com.puppycrawl.tools:checkstyle Checkstyle prior to 8.18 loads external DTDs by default, which can potentially lead to denial of service attacks or the leaking of confidential information. Affected versions: < 8.18 -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27757) Bump Jackson to 2.9.9
Fokko Driesprong created SPARK-27757: Summary: Bump Jackson to 2.9.9 Key: SPARK-27757 URL: https://issues.apache.org/jira/browse/SPARK-27757 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.0.0 Reporter: Fokko Driesprong This fixes CVE-2019-12086 on Databind: https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.9.9 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25408) Move to idiomatic Java 8
Fokko Driesprong created SPARK-25408: Summary: Move to idiomatic Java 8 Key: SPARK-25408 URL: https://issues.apache.org/jira/browse/SPARK-25408 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 2.3.1 Reporter: Fokko Driesprong Java8 has some nice functions such as try-with-resource and the Collections library, which isn't used a lot in the Spark codebase. We might consider to using this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25033) Bump Apache commons.{httpclient, httpcore}
Fokko Driesprong created SPARK-25033: Summary: Bump Apache commons.{httpclient, httpcore} Key: SPARK-25033 URL: https://issues.apache.org/jira/browse/SPARK-25033 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.3.1 Reporter: Fokko Driesprong I would like to bump the versions to make it up to date with my other dependencies, in my case Stocator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24603) Typo in comments
Fokko Driesprong created SPARK-24603: Summary: Typo in comments Key: SPARK-24603 URL: https://issues.apache.org/jira/browse/SPARK-24603 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.3.1 Reporter: Fokko Driesprong The findTightestCommonTypeOfTwo has been renamed to findTightestCommonType -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24601) Bump Jackson version to 2.9.6
Fokko Driesprong created SPARK-24601: Summary: Bump Jackson version to 2.9.6 Key: SPARK-24601 URL: https://issues.apache.org/jira/browse/SPARK-24601 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.3.1 Reporter: Fokko Driesprong The Jackson version is lacking behind, and therefore I have to add a lot of exclusions to the SBT files: ``` Caused by: com.fasterxml.jackson.databind.JsonMappingException: Incompatible Jackson version: 2.9.5 at com.fasterxml.jackson.module.scala.JacksonModule$class.setupModule(JacksonModule.scala:64) at com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:19) at com.fasterxml.jackson.databind.ObjectMapper.registerModule(ObjectMapper.java:751) at org.apache.spark.rdd.RDDOperationScope$.(RDDOperationScope.scala:82) at org.apache.spark.rdd.RDDOperationScope$.(RDDOperationScope.scala) ``` -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24520) Double braces in link
Fokko Driesprong created SPARK-24520: Summary: Double braces in link Key: SPARK-24520 URL: https://issues.apache.org/jira/browse/SPARK-24520 Project: Spark Issue Type: Bug Components: Documentation Affects Versions: 2.3.0 Reporter: Fokko Driesprong Double braces in the markdown, which break the link -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23854) Update Guava to 16.0.1
Fokko Driesprong created SPARK-23854: Summary: Update Guava to 16.0.1 Key: SPARK-23854 URL: https://issues.apache.org/jira/browse/SPARK-23854 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 2.3.0 Reporter: Fokko Driesprong Currently Spark is still on Guava 14.0.1, and therefore I would like to bump the version to 16.0.1. Babysteps are important here, because we don't want to become incompatible with other technology stacks, but 14.0.1 is getting old. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22919) Bump Apache httpclient versions
Fokko Driesprong created SPARK-22919: Summary: Bump Apache httpclient versions Key: SPARK-22919 URL: https://issues.apache.org/jira/browse/SPARK-22919 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.2.1 Reporter: Fokko Driesprong I would like to bump the PATCH versions of both the Apache httpclient Apache httpcore. I use the SparkTC Stocator library for connecting to an object store, and I would align the versions to reduce java version mismatches. Furthermore it is good to bump these versions since they fix stability and performance issues: https://archive.apache.org/dist/httpcomponents/httpclient/RELEASE_NOTES-4.5.x.txt https://www.apache.org/dist/httpcomponents/httpcore/RELEASE_NOTES-4.4.x.txt -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16481) Spark does not update statistics when making use of Hive partitions
Fokko Driesprong created SPARK-16481: Summary: Spark does not update statistics when making use of Hive partitions Key: SPARK-16481 URL: https://issues.apache.org/jira/browse/SPARK-16481 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.6.1 Reporter: Fokko Driesprong Hi all, I've had some strange behaviour using Hive partitions. Turned out, when using Hive partitions, the statistics of the Parquet get not updated properly when inserting new data. I've isolated the issue in the following case: https://github.com/Fokko/spark-strange-refresh-behaviour The fix right now is to refresh the data by hand, which is quite error prone as it can be easily forgotten. Cheers, Fokko Driesprong. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12869) Optimize conversion from BlockMatrix to IndexedRowMatrix
[ https://issues.apache.org/jira/browse/SPARK-12869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong updated SPARK-12869: - Flags: Patch Affects Version/s: 1.6.0 Target Version/s: 1.6.1 Fix Version/s: 1.6.1 > Optimize conversion from BlockMatrix to IndexedRowMatrix > > > Key: SPARK-12869 > URL: https://issues.apache.org/jira/browse/SPARK-12869 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.6.0 >Reporter: Fokko Driesprong > Fix For: 1.6.1 > > Original Estimate: 48h > Remaining Estimate: 48h > > In the current implementation of the BlockMatrix, the conversion to the > IndexedRowMatrix is done by converting it to a CoordinateMatrix first. This > is somewhat ok when the matrix is very sparse, but for dense matrices this is > very inefficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12869) Optimize conversion from BlockMatrix to IndexedRowMatrix
[ https://issues.apache.org/jira/browse/SPARK-12869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107533#comment-15107533 ] Fokko Driesprong commented on SPARK-12869: -- Hi guys, I've implemented an improved version of the toIndexedRowMatrix function on the BlockMatrix. I needed this for a project, but would like to share it with the rest of the community. In the case of dense matrices, it can increase performance up to 19 times: https://github.com/Fokko/BlockMatrixToIndexedRowMatrix The pull-request on Github: https://github.com/apache/spark/pull/10839 > Optimize conversion from BlockMatrix to IndexedRowMatrix > > > Key: SPARK-12869 > URL: https://issues.apache.org/jira/browse/SPARK-12869 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Fokko Driesprong > Original Estimate: 48h > Remaining Estimate: 48h > > In the current implementation of the BlockMatrix, the conversion to the > IndexedRowMatrix is done by converting it to a CoordinateMatrix first. This > is somewhat ok when the matrix is very sparse, but for dense matrices this is > very inefficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12869) Optimize conversion from BlockMatrix to IndexedRowMatrix
Fokko Driesprong created SPARK-12869: Summary: Optimize conversion from BlockMatrix to IndexedRowMatrix Key: SPARK-12869 URL: https://issues.apache.org/jira/browse/SPARK-12869 Project: Spark Issue Type: Improvement Components: MLlib Reporter: Fokko Driesprong In the current implementation of the BlockMatrix, the conversion to the IndexedRowMatrix is done by converting it to a CoordinateMatrix first. This is somewhat ok when the matrix is very sparse, but for dense matrices this is very inefficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org