[jira] [Commented] (SPARK-24131) Add majorMinorVersion API to PySpark for determining Spark versions
[ https://issues.apache.org/jira/browse/SPARK-24131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16459427#comment-16459427 ] Apache Spark commented on SPARK-24131: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/21203 > Add majorMinorVersion API to PySpark for determining Spark versions > --- > > Key: SPARK-24131 > URL: https://issues.apache.org/jira/browse/SPARK-24131 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 2.4.0 >Reporter: Liang-Chi Hsieh >Priority: Minor > > We need to determine Spark major and minor versions in PySpark. We can add a > {{majorMinorVersion}} API to PySpark which is similar to the API in > {{VersionUtils.majorMinorVersion}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24131) Add majorMinorVersion API to PySpark for determining Spark versions
[ https://issues.apache.org/jira/browse/SPARK-24131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24131: Assignee: Apache Spark > Add majorMinorVersion API to PySpark for determining Spark versions > --- > > Key: SPARK-24131 > URL: https://issues.apache.org/jira/browse/SPARK-24131 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 2.4.0 >Reporter: Liang-Chi Hsieh >Assignee: Apache Spark >Priority: Minor > > We need to determine Spark major and minor versions in PySpark. We can add a > {{majorMinorVersion}} API to PySpark which is similar to the API in > {{VersionUtils.majorMinorVersion}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24131) Add majorMinorVersion API to PySpark for determining Spark versions
[ https://issues.apache.org/jira/browse/SPARK-24131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24131: Assignee: (was: Apache Spark) > Add majorMinorVersion API to PySpark for determining Spark versions > --- > > Key: SPARK-24131 > URL: https://issues.apache.org/jira/browse/SPARK-24131 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 2.4.0 >Reporter: Liang-Chi Hsieh >Priority: Minor > > We need to determine Spark major and minor versions in PySpark. We can add a > {{majorMinorVersion}} API to PySpark which is similar to the API in > {{VersionUtils.majorMinorVersion}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24131) Add majorMinorVersion API to PySpark for determining Spark versions
Liang-Chi Hsieh created SPARK-24131: --- Summary: Add majorMinorVersion API to PySpark for determining Spark versions Key: SPARK-24131 URL: https://issues.apache.org/jira/browse/SPARK-24131 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 2.4.0 Reporter: Liang-Chi Hsieh We need to determine Spark major and minor versions in PySpark. We can add a {{majorMinorVersion}} API to PySpark which is similar to the API in {{VersionUtils.majorMinorVersion}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24130) Data Source V2: Join Push Down
Jia Li created SPARK-24130: -- Summary: Data Source V2: Join Push Down Key: SPARK-24130 URL: https://issues.apache.org/jira/browse/SPARK-24130 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 2.3.0 Reporter: Jia Li Spark applications often directly query external data sources such as relational databases, or files. Spark provides Data Sources APIs for accessing structured data through Spark SQL. Data Sources APIs in both V1 and V2 support optimizations such as Filter push down and Column pruning which are subset of the functionality that can be pushed down to some data sources. We’re proposing to extend Data Sources APIs with join push down (JPD). Join push down significantly improves query performance by reducing the amount of data transfer and exploiting the capabilities of the data sources such as index access. Join push down design document is available [here|https://docs.google.com/document/d/1k-kRadTcUbxVfUQwqBbIXs_yPZMxh18-e-cz77O_TaE/edit?usp=sharing]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-23853) Skip doctests which require hive support built in PySpark
[ https://issues.apache.org/jira/browse/SPARK-23853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-23853. -- Resolution: Fixed Fix Version/s: 2.3.1 2.4.0 Issue resolved by pull request 21141 [https://github.com/apache/spark/pull/21141] > Skip doctests which require hive support built in PySpark > - > > Key: SPARK-23853 > URL: https://issues.apache.org/jira/browse/SPARK-23853 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 2.4.0 >Reporter: holdenk >Assignee: Dongjoon Hyun >Priority: Trivial > Fix For: 2.4.0, 2.3.1 > > > As we do with detecting if various libraries are installed if there is no > support built in we should skip the tests which require hive. > e.g. the readwrite doctest. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-23853) Skip doctests which require hive support built in PySpark
[ https://issues.apache.org/jira/browse/SPARK-23853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-23853: Assignee: Dongjoon Hyun > Skip doctests which require hive support built in PySpark > - > > Key: SPARK-23853 > URL: https://issues.apache.org/jira/browse/SPARK-23853 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 2.4.0 >Reporter: holdenk >Assignee: Dongjoon Hyun >Priority: Trivial > Fix For: 2.3.1, 2.4.0 > > > As we do with detecting if various libraries are installed if there is no > support built in we should skip the tests which require hive. > e.g. the readwrite doctest. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24129) Add option to pass --build-arg's to docker-image-tool.sh
[ https://issues.apache.org/jira/browse/SPARK-24129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24129: Assignee: (was: Apache Spark) > Add option to pass --build-arg's to docker-image-tool.sh > > > Key: SPARK-24129 > URL: https://issues.apache.org/jira/browse/SPARK-24129 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Devaraj K >Priority: Minor > > When we are working behind the firewall, we may need to pass the proxy > details as part of the docker --build-arg parameters to build the image. But > docker-image-tool.sh doesn't provide option to pass the proxy details or the > --build-arg to the docker command. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24129) Add option to pass --build-arg's to docker-image-tool.sh
[ https://issues.apache.org/jira/browse/SPARK-24129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16459349#comment-16459349 ] Apache Spark commented on SPARK-24129: -- User 'devaraj-kavali' has created a pull request for this issue: https://github.com/apache/spark/pull/21202 > Add option to pass --build-arg's to docker-image-tool.sh > > > Key: SPARK-24129 > URL: https://issues.apache.org/jira/browse/SPARK-24129 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Devaraj K >Priority: Minor > > When we are working behind the firewall, we may need to pass the proxy > details as part of the docker --build-arg parameters to build the image. But > docker-image-tool.sh doesn't provide option to pass the proxy details or the > --build-arg to the docker command. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24129) Add option to pass --build-arg's to docker-image-tool.sh
[ https://issues.apache.org/jira/browse/SPARK-24129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24129: Assignee: Apache Spark > Add option to pass --build-arg's to docker-image-tool.sh > > > Key: SPARK-24129 > URL: https://issues.apache.org/jira/browse/SPARK-24129 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Devaraj K >Assignee: Apache Spark >Priority: Minor > > When we are working behind the firewall, we may need to pass the proxy > details as part of the docker --build-arg parameters to build the image. But > docker-image-tool.sh doesn't provide option to pass the proxy details or the > --build-arg to the docker command. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23940) High-order function: transform_values(map<K, V1>, function<K, V1, V2>) → map<K, V2>
[ https://issues.apache.org/jira/browse/SPARK-23940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated SPARK-23940: --- Summary: High-order function: transform_values(map, function ) → map (was: High-ofer function: transform_values(map , function ) → map ) > High-order function: transform_values(map , function ) → > map > --- > > Key: SPARK-23940 > URL: https://issues.apache.org/jira/browse/SPARK-23940 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Xiao Li >Priority: Major > > Ref: https://prestodb.io/docs/current/functions/map.html > Returns a map that applies function to each entry of map and transforms the > values. > {noformat} > SELECT transform_values(MAP(ARRAY[], ARRAY[]), (k, v) -> v + 1); -- {} > SELECT transform_values(MAP(ARRAY [1, 2, 3], ARRAY [10, 20, 30]), (k, v) -> v > + k); -- {1 -> 11, 2 -> 22, 3 -> 33} > SELECT transform_values(MAP(ARRAY [1, 2, 3], ARRAY ['a', 'b', 'c']), (k, v) > -> k * k); -- {1 -> 1, 2 -> 4, 3 -> 9} > SELECT transform_values(MAP(ARRAY ['a', 'b'], ARRAY [1, 2]), (k, v) -> k || > CAST(v as VARCHAR)); -- {a -> a1, b -> b2} > SELECT transform_values(MAP(ARRAY [1, 2], ARRAY [1.0, 1.4]), -- {1 -> > one_1.0, 2 -> two_1.4} > (k, v) -> MAP(ARRAY[1, 2], ARRAY['one', 'two'])[k] || > '_' || CAST(v AS VARCHAR)); > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24128) Mention spark.sql.crossJoin.enabled in implicit cartesian product error msg
[ https://issues.apache.org/jira/browse/SPARK-24128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16459333#comment-16459333 ] Apache Spark commented on SPARK-24128: -- User 'henryr' has created a pull request for this issue: https://github.com/apache/spark/pull/21201 > Mention spark.sql.crossJoin.enabled in implicit cartesian product error msg > --- > > Key: SPARK-24128 > URL: https://issues.apache.org/jira/browse/SPARK-24128 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Henry Robinson >Priority: Minor > > The error message given when a query contains an implicit cartesian product > suggests rewriting the query using {{CROSS JOIN}}, but not disabling the > check using {{spark.sql.crossJoin.enabled=true}}. It's sometimes easier to > change a config variable than edit a query, so it would be helpful to make > the user aware of their options. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24128) Mention spark.sql.crossJoin.enabled in implicit cartesian product error msg
[ https://issues.apache.org/jira/browse/SPARK-24128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24128: Assignee: (was: Apache Spark) > Mention spark.sql.crossJoin.enabled in implicit cartesian product error msg > --- > > Key: SPARK-24128 > URL: https://issues.apache.org/jira/browse/SPARK-24128 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Henry Robinson >Priority: Minor > > The error message given when a query contains an implicit cartesian product > suggests rewriting the query using {{CROSS JOIN}}, but not disabling the > check using {{spark.sql.crossJoin.enabled=true}}. It's sometimes easier to > change a config variable than edit a query, so it would be helpful to make > the user aware of their options. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24128) Mention spark.sql.crossJoin.enabled in implicit cartesian product error msg
[ https://issues.apache.org/jira/browse/SPARK-24128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24128: Assignee: Apache Spark > Mention spark.sql.crossJoin.enabled in implicit cartesian product error msg > --- > > Key: SPARK-24128 > URL: https://issues.apache.org/jira/browse/SPARK-24128 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Henry Robinson >Assignee: Apache Spark >Priority: Minor > > The error message given when a query contains an implicit cartesian product > suggests rewriting the query using {{CROSS JOIN}}, but not disabling the > check using {{spark.sql.crossJoin.enabled=true}}. It's sometimes easier to > change a config variable than edit a query, so it would be helpful to make > the user aware of their options. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24129) Add option to pass --build-arg's to docker-image-tool.sh
Devaraj K created SPARK-24129: - Summary: Add option to pass --build-arg's to docker-image-tool.sh Key: SPARK-24129 URL: https://issues.apache.org/jira/browse/SPARK-24129 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 2.4.0 Reporter: Devaraj K When we are working behind the firewall, we may need to pass the proxy details as part of the docker --build-arg parameters to build the image. But docker-image-tool.sh doesn't provide option to pass the proxy details or the --build-arg to the docker command. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24128) Mention spark.sql.crossJoin.enabled in implicit cartesian product error msg
Henry Robinson created SPARK-24128: -- Summary: Mention spark.sql.crossJoin.enabled in implicit cartesian product error msg Key: SPARK-24128 URL: https://issues.apache.org/jira/browse/SPARK-24128 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Henry Robinson The error message given when a query contains an implicit cartesian product suggests rewriting the query using {{CROSS JOIN}}, but not disabling the check using {{spark.sql.crossJoin.enabled=true}}. It's sometimes easier to change a config variable than edit a query, so it would be helpful to make the user aware of their options. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24039) remove restarting iterators hack
[ https://issues.apache.org/jira/browse/SPARK-24039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24039: Assignee: Apache Spark > remove restarting iterators hack > > > Key: SPARK-24039 > URL: https://issues.apache.org/jira/browse/SPARK-24039 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 2.4.0 >Reporter: Jose Torres >Assignee: Apache Spark >Priority: Major > > Currently, continuous processing execution calls next() to restart the query > iterator after it returns false. This doesn't work for complex RDDs - we need > to call compute() instead. > This isn't refactoring-only; changes will be required to keep the reader from > starting over in each compute() call. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24039) remove restarting iterators hack
[ https://issues.apache.org/jira/browse/SPARK-24039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16459208#comment-16459208 ] Apache Spark commented on SPARK-24039: -- User 'jose-torres' has created a pull request for this issue: https://github.com/apache/spark/pull/21200 > remove restarting iterators hack > > > Key: SPARK-24039 > URL: https://issues.apache.org/jira/browse/SPARK-24039 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 2.4.0 >Reporter: Jose Torres >Priority: Major > > Currently, continuous processing execution calls next() to restart the query > iterator after it returns false. This doesn't work for complex RDDs - we need > to call compute() instead. > This isn't refactoring-only; changes will be required to keep the reader from > starting over in each compute() call. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24039) remove restarting iterators hack
[ https://issues.apache.org/jira/browse/SPARK-24039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24039: Assignee: (was: Apache Spark) > remove restarting iterators hack > > > Key: SPARK-24039 > URL: https://issues.apache.org/jira/browse/SPARK-24039 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 2.4.0 >Reporter: Jose Torres >Priority: Major > > Currently, continuous processing execution calls next() to restart the query > iterator after it returns false. This doesn't work for complex RDDs - we need > to call compute() instead. > This isn't refactoring-only; changes will be required to keep the reader from > starting over in each compute() call. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23549) Spark SQL unexpected behavior when comparing timestamp to date
[ https://issues.apache.org/jira/browse/SPARK-23549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16459136#comment-16459136 ] Dongjoon Hyun commented on SPARK-23549: --- [~emlyn]. This issue changes the behavior and introduces new conf `spark.sql.hive.compareDateTimestampInTimestamp`. I don't think this will be included in Spark 2.3.1. > Spark SQL unexpected behavior when comparing timestamp to date > -- > > Key: SPARK-23549 > URL: https://issues.apache.org/jira/browse/SPARK-23549 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.3, 2.0.2, 2.1.2, 2.2.1, 2.3.0 >Reporter: Dong Jiang >Assignee: Kazuaki Ishizaki >Priority: Major > Fix For: 2.4.0 > > > {code:java} > scala> spark.version > res1: String = 2.2.1 > scala> spark.sql("select cast('2017-03-01 00:00:00' as timestamp) between > cast('2017-02-28' as date) and cast('2017-03-01' as date)").show > +---+ > |((CAST(CAST(2017-03-01 00:00:00 AS TIMESTAMP) AS STRING) >= > CAST(CAST(2017-02-28 AS DATE) AS STRING)) AND (CAST(CAST(2017-03-01 00:00:00 > AS TIMESTAMP) AS STRING) <= CAST(CAST(2017-03-01 AS DATE) AS STRING)))| > +---+ > | > > false| > +---+{code} > As shown above, when a timestamp is compared to date in SparkSQL, both > timestamp and date are downcast to string, and leading to unexpected result. > If run the same SQL in presto/Athena, I got the expected result > {code:java} > select cast('2017-03-01 00:00:00' as timestamp) between cast('2017-02-28' as > date) and cast('2017-03-01' as date) > _col0 > 1 true > {code} > Is this a bug for Spark or a feature? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24127) Support text socket source in continuous mode
[ https://issues.apache.org/jira/browse/SPARK-24127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24127: Assignee: (was: Apache Spark) > Support text socket source in continuous mode > - > > Key: SPARK-24127 > URL: https://issues.apache.org/jira/browse/SPARK-24127 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 2.4.0 >Reporter: Arun Mahadevan >Priority: Minor > > Currently the text socket source is supported for structured streaming micro > batch mode. > Supporting it in continuous mode enables running structured streaming > continuous pipelines where one can ingest data via "nc" and run examples. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24127) Support text socket source in continuous mode
[ https://issues.apache.org/jira/browse/SPARK-24127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16459129#comment-16459129 ] Apache Spark commented on SPARK-24127: -- User 'arunmahadevan' has created a pull request for this issue: https://github.com/apache/spark/pull/21199 > Support text socket source in continuous mode > - > > Key: SPARK-24127 > URL: https://issues.apache.org/jira/browse/SPARK-24127 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 2.4.0 >Reporter: Arun Mahadevan >Priority: Minor > > Currently the text socket source is supported for structured streaming micro > batch mode. > Supporting it in continuous mode enables running structured streaming > continuous pipelines where one can ingest data via "nc" and run examples. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24127) Support text socket source in continuous mode
[ https://issues.apache.org/jira/browse/SPARK-24127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24127: Assignee: Apache Spark > Support text socket source in continuous mode > - > > Key: SPARK-24127 > URL: https://issues.apache.org/jira/browse/SPARK-24127 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 2.4.0 >Reporter: Arun Mahadevan >Assignee: Apache Spark >Priority: Minor > > Currently the text socket source is supported for structured streaming micro > batch mode. > Supporting it in continuous mode enables running structured streaming > continuous pipelines where one can ingest data via "nc" and run examples. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24127) Support text socket source in continuous mode
Arun Mahadevan created SPARK-24127: -- Summary: Support text socket source in continuous mode Key: SPARK-24127 URL: https://issues.apache.org/jira/browse/SPARK-24127 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 2.4.0 Reporter: Arun Mahadevan Currently the text socket source is supported for structured streaming micro batch mode. Supporting it in continuous mode enables running structured streaming continuous pipelines where one can ingest data via "nc" and run examples. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24115) improve instrumentation for spark.ml.tuning
[ https://issues.apache.org/jira/browse/SPARK-24115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16459033#comment-16459033 ] Joseph K. Bradley commented on SPARK-24115: --- Sounds good; go ahead. > improve instrumentation for spark.ml.tuning > --- > > Key: SPARK-24115 > URL: https://issues.apache.org/jira/browse/SPARK-24115 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 2.3.0 >Reporter: yogesh garg >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24003) Add support to provide spark.executor.extraJavaOptions in terms of App Id and/or Executor Id's
[ https://issues.apache.org/jira/browse/SPARK-24003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin reassigned SPARK-24003: -- Assignee: Devaraj K > Add support to provide spark.executor.extraJavaOptions in terms of App Id > and/or Executor Id's > -- > > Key: SPARK-24003 > URL: https://issues.apache.org/jira/browse/SPARK-24003 > Project: Spark > Issue Type: Improvement > Components: Mesos, Spark Core, YARN >Affects Versions: 2.3.0 >Reporter: Devaraj K >Assignee: Devaraj K >Priority: Major > Fix For: 2.4.0 > > > Users may want to enable gc logging or heap dump for the executors, but there > is a chance of overwriting it by other executors since the paths cannot be > expressed dynamically. This improvement would enable to express the > spark.executor.extraJavaOptions paths in terms of App Id and Executor Id's to > avoid the overwriting by other executors. > There was a discussion about this in SPARK-3767, but it never fixed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-24003) Add support to provide spark.executor.extraJavaOptions in terms of App Id and/or Executor Id's
[ https://issues.apache.org/jira/browse/SPARK-24003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-24003. Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 21088 [https://github.com/apache/spark/pull/21088] > Add support to provide spark.executor.extraJavaOptions in terms of App Id > and/or Executor Id's > -- > > Key: SPARK-24003 > URL: https://issues.apache.org/jira/browse/SPARK-24003 > Project: Spark > Issue Type: Improvement > Components: Mesos, Spark Core, YARN >Affects Versions: 2.3.0 >Reporter: Devaraj K >Assignee: Devaraj K >Priority: Major > Fix For: 2.4.0 > > > Users may want to enable gc logging or heap dump for the executors, but there > is a chance of overwriting it by other executors since the paths cannot be > expressed dynamically. This improvement would enable to express the > spark.executor.extraJavaOptions paths in terms of App Id and Executor Id's to > avoid the overwriting by other executors. > There was a discussion about this in SPARK-3767, but it never fixed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23781) Merge YARN and Mesos token renewal code
[ https://issues.apache.org/jira/browse/SPARK-23781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated SPARK-23781: --- Component/s: (was: yan) YARN > Merge YARN and Mesos token renewal code > --- > > Key: SPARK-23781 > URL: https://issues.apache.org/jira/browse/SPARK-23781 > Project: Spark > Issue Type: Improvement > Components: Mesos, YARN >Affects Versions: 2.4.0 >Reporter: Marcelo Vanzin >Priority: Major > > With the fix for SPARK-23361, the code that handles delegation tokens in > Mesos and YARN ends up being very similar. > We shouyld refactor that code so that both backends are sharing the same > code, which also would make it easier for other cluster managers to use that > code. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24126) PySpark tests leave a lot of garbage in /tmp
[ https://issues.apache.org/jira/browse/SPARK-24126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16459000#comment-16459000 ] Apache Spark commented on SPARK-24126: -- User 'vanzin' has created a pull request for this issue: https://github.com/apache/spark/pull/21198 > PySpark tests leave a lot of garbage in /tmp > > > Key: SPARK-24126 > URL: https://issues.apache.org/jira/browse/SPARK-24126 > Project: Spark > Issue Type: Improvement > Components: PySpark, Tests >Affects Versions: 2.4.0 >Reporter: Marcelo Vanzin >Priority: Minor > > When you run pyspark tests, they leave a lot of garbage in /tmp. The test > code should do a better job at cleaning up after itself, and also try to keep > things under the build directory so that things like "mvn clean" or "git > clean" can do their thing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24126) PySpark tests leave a lot of garbage in /tmp
[ https://issues.apache.org/jira/browse/SPARK-24126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24126: Assignee: Apache Spark > PySpark tests leave a lot of garbage in /tmp > > > Key: SPARK-24126 > URL: https://issues.apache.org/jira/browse/SPARK-24126 > Project: Spark > Issue Type: Improvement > Components: PySpark, Tests >Affects Versions: 2.4.0 >Reporter: Marcelo Vanzin >Assignee: Apache Spark >Priority: Minor > > When you run pyspark tests, they leave a lot of garbage in /tmp. The test > code should do a better job at cleaning up after itself, and also try to keep > things under the build directory so that things like "mvn clean" or "git > clean" can do their thing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24126) PySpark tests leave a lot of garbage in /tmp
[ https://issues.apache.org/jira/browse/SPARK-24126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24126: Assignee: (was: Apache Spark) > PySpark tests leave a lot of garbage in /tmp > > > Key: SPARK-24126 > URL: https://issues.apache.org/jira/browse/SPARK-24126 > Project: Spark > Issue Type: Improvement > Components: PySpark, Tests >Affects Versions: 2.4.0 >Reporter: Marcelo Vanzin >Priority: Minor > > When you run pyspark tests, they leave a lot of garbage in /tmp. The test > code should do a better job at cleaning up after itself, and also try to keep > things under the build directory so that things like "mvn clean" or "git > clean" can do their thing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24126) PySpark tests leave a lot of garbage in /tmp
[ https://issues.apache.org/jira/browse/SPARK-24126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16458974#comment-16458974 ] Marcelo Vanzin commented on SPARK-24126: BTW I'm testing changes to implement this. > PySpark tests leave a lot of garbage in /tmp > > > Key: SPARK-24126 > URL: https://issues.apache.org/jira/browse/SPARK-24126 > Project: Spark > Issue Type: Improvement > Components: PySpark, Tests >Affects Versions: 2.4.0 >Reporter: Marcelo Vanzin >Priority: Minor > > When you run pyspark tests, they leave a lot of garbage in /tmp. The test > code should do a better job at cleaning up after itself, and also try to keep > things under the build directory so that things like "mvn clean" or "git > clean" can do their thing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24126) PySpark tests leave a lot of garbage in /tmp
Marcelo Vanzin created SPARK-24126: -- Summary: PySpark tests leave a lot of garbage in /tmp Key: SPARK-24126 URL: https://issues.apache.org/jira/browse/SPARK-24126 Project: Spark Issue Type: Improvement Components: PySpark, Tests Affects Versions: 2.4.0 Reporter: Marcelo Vanzin When you run pyspark tests, they leave a lot of garbage in /tmp. The test code should do a better job at cleaning up after itself, and also try to keep things under the build directory so that things like "mvn clean" or "git clean" can do their thing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24125) Add quoting rules to SQL guide
Henry Robinson created SPARK-24125: -- Summary: Add quoting rules to SQL guide Key: SPARK-24125 URL: https://issues.apache.org/jira/browse/SPARK-24125 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Henry Robinson As far as I can tell, Spark SQL's quoting rules are as follows: * {{`foo bar`}} is an identifier * {{'foo bar'}} is a string literal * {{"foo bar"}} is a string literal The last of these is non-standard (usually {{"foo bar"}} is an identifier), and so it's probably worth mentioning these rules in the 'reference' section of the [SQL guide|http://spark.apache.org/docs/latest/sql-programming-guide.html#reference]. I'm assuming there's not a lot of enthusiasm to change the quoting rules, given it would be a breaking change, and that backticks work just fine as an alternative. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23971) Should not leak Spark sessions across test suites
[ https://issues.apache.org/jira/browse/SPARK-23971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16458959#comment-16458959 ] Apache Spark commented on SPARK-23971: -- User 'gatorsmile' has created a pull request for this issue: https://github.com/apache/spark/pull/21197 > Should not leak Spark sessions across test suites > - > > Key: SPARK-23971 > URL: https://issues.apache.org/jira/browse/SPARK-23971 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Eric Liang >Assignee: Eric Liang >Priority: Major > Fix For: 2.4.0 > > > Many suites currently leak Spark sessions (sometimes with stopped > SparkContexts) via the thread-local active Spark session and default Spark > session. We should attempt to clean these up and detect when this happens to > improve the reproducibility of tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24123) Fix a flaky test `DateTimeUtilsSuite.monthsBetween`
[ https://issues.apache.org/jira/browse/SPARK-24123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24123: Assignee: (was: Apache Spark) > Fix a flaky test `DateTimeUtilsSuite.monthsBetween` > --- > > Key: SPARK-24123 > URL: https://issues.apache.org/jira/browse/SPARK-24123 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Dongjoon Hyun >Priority: Minor > > **MASTER BRANCH** > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.6/4810/testReport/org.apache.spark.sql.catalyst.util/DateTimeUtilsSuite/monthsBetween/ > {code} > Error Message > 3.949596773820191 did not equal 3.9495967741935485 > Stacktrace > org.scalatest.exceptions.TestFailedException: 3.949596773820191 did not > equal 3.9495967741935485 > at > org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:528) > at > org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560) > at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:501) > at > org.apache.spark.sql.catalyst.util.DateTimeUtilsSuite$$anonfun$25.apply(DateTimeUtilsSuite.scala:495) > at > org.apache.spark.sql.catalyst.util.DateTimeUtilsSuite$$anonfun$25.apply(DateTimeUtilsSuite.scala:488) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24123) Fix a flaky test `DateTimeUtilsSuite.monthsBetween`
[ https://issues.apache.org/jira/browse/SPARK-24123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16458907#comment-16458907 ] Apache Spark commented on SPARK-24123: -- User 'mgaido91' has created a pull request for this issue: https://github.com/apache/spark/pull/21196 > Fix a flaky test `DateTimeUtilsSuite.monthsBetween` > --- > > Key: SPARK-24123 > URL: https://issues.apache.org/jira/browse/SPARK-24123 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Dongjoon Hyun >Priority: Minor > > **MASTER BRANCH** > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.6/4810/testReport/org.apache.spark.sql.catalyst.util/DateTimeUtilsSuite/monthsBetween/ > {code} > Error Message > 3.949596773820191 did not equal 3.9495967741935485 > Stacktrace > org.scalatest.exceptions.TestFailedException: 3.949596773820191 did not > equal 3.9495967741935485 > at > org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:528) > at > org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560) > at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:501) > at > org.apache.spark.sql.catalyst.util.DateTimeUtilsSuite$$anonfun$25.apply(DateTimeUtilsSuite.scala:495) > at > org.apache.spark.sql.catalyst.util.DateTimeUtilsSuite$$anonfun$25.apply(DateTimeUtilsSuite.scala:488) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24123) Fix a flaky test `DateTimeUtilsSuite.monthsBetween`
[ https://issues.apache.org/jira/browse/SPARK-24123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24123: Assignee: Apache Spark > Fix a flaky test `DateTimeUtilsSuite.monthsBetween` > --- > > Key: SPARK-24123 > URL: https://issues.apache.org/jira/browse/SPARK-24123 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Minor > > **MASTER BRANCH** > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.6/4810/testReport/org.apache.spark.sql.catalyst.util/DateTimeUtilsSuite/monthsBetween/ > {code} > Error Message > 3.949596773820191 did not equal 3.9495967741935485 > Stacktrace > org.scalatest.exceptions.TestFailedException: 3.949596773820191 did not > equal 3.9495967741935485 > at > org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:528) > at > org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560) > at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:501) > at > org.apache.spark.sql.catalyst.util.DateTimeUtilsSuite$$anonfun$25.apply(DateTimeUtilsSuite.scala:495) > at > org.apache.spark.sql.catalyst.util.DateTimeUtilsSuite$$anonfun$25.apply(DateTimeUtilsSuite.scala:488) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24124) Spark history server should create spark.history.store.path and set permissions properly
[ https://issues.apache.org/jira/browse/SPARK-24124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16458904#comment-16458904 ] Marcelo Vanzin commented on SPARK-24124: This should be fine. > Spark history server should create spark.history.store.path and set > permissions properly > > > Key: SPARK-24124 > URL: https://issues.apache.org/jira/browse/SPARK-24124 > Project: Spark > Issue Type: Story > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Priority: Major > > Current with the new spark history server you can set > spark.history.store.path to a location to store the levelDB files. Currently > the directory has to be made before it can use that path. > We should just have the history server create it and set the file permissions > on the leveldb files to be restrictive -> new FsPermission((short) 0700) > the shuffle service already does this, this would be much more convenient to > use and prevent people from making mistakes with the permissions on the > directory and files. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23530) It's not appropriate to let the original master exit while the leader of zookeeper shutdown
[ https://issues.apache.org/jira/browse/SPARK-23530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16458895#comment-16458895 ] Ashwin Agate commented on SPARK-23530: -- Can we please increase the priority of this bug since it exists in latest Spark 2.3.0 too? We have observed this during upgrade scenario (with Spark 1.6.3), where we have to shutdown zookeeper, which has the adverse side-effect of spark master shutting down on other nodes which is not very ideal. BTW https://issues.apache.org/jira/browse/SPARK-15544 is the similar issue which was filed for Spark 1.6.1 > It's not appropriate to let the original master exit while the leader of > zookeeper shutdown > --- > > Key: SPARK-23530 > URL: https://issues.apache.org/jira/browse/SPARK-23530 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.1, 2.3.0 >Reporter: liuxianjiao >Priority: Critical > > When the leader of zookeeper shutdown,the current method of spark is letting > the master exit to revoke the leadership.However,this sacrifice a master > node.According the treatment of hadoop and storm ,we should let the origin > active master to be standby ,or Re-election for spark master,or any other > ways to revoke leadership gracefully. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit
[ https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1645#comment-1645 ] Ashwin Agate edited comment on SPARK-15544 at 4/30/18 7:05 PM: --- {code:java} Can we please increase the priority of this bug since it exists in latest Spark 2.3.0 too? We have observed this during upgrade scenario (with Spark 1.6.3), where we have to shutdown zookeeper, which has the adverse side-effect of spark master shutting down on other nodes which is not very ideal. Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: spark-box1:40588 got disassociated, removing it. Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: spark-box1:7078 got disassociated, removing it. Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Removing worker worker-20180427105900-spark-box1-7078 on 19 Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Telling app of lost executor: 2 Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: Unable to read additional data from server sessionid 0x1630 Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: Unable to read additional data from server sessionid 0x1630 Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ConnectionStateManager: State change: SUSPENDED Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ZooKeeperLeaderElectionAgent: We have lost leadership Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 ERROR Master: Leadership has been revoked – master shutting down.{code} was (Author: agateaaa): Can we please increase the priority of this bug since it exists in latest Spark 2.3.0 too? We have observed this during upgrade scenario (with Spark 1.6.3), where we have to shutdown zookeeper, which has the adverse side-effect of spark master shutting down on other nodes which is not very ideal. Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: spark-box1:40588 got disassociated, removing it. Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: spark-box1:7078 got disassociated, removing it. Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Removing worker worker-20180427105900-spark-box1-7078 on 19 Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Telling app of lost executor: 2 Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: Unable to read additional data from server sessionid 0x1630 Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: Unable to read additional data from server sessionid 0x1630 Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ConnectionStateManager: State change: SUSPENDED Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ZooKeeperLeaderElectionAgent: We have lost leadership Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 ERROR Master: Leadership has been revoked -- master shutting down. {code:java} {code} > Bouncing Zookeeper node causes Active spark master to exit > -- > > Key: SPARK-15544 > URL: https://issues.apache.org/jira/browse/SPARK-15544 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 > Environment: Ubuntu 14.04. Zookeeper 3.4.6 with 3-node quorum >Reporter: Steven Lowenthal >Priority: Major > > Shutting Down a single zookeeper node caused spark master to exit. The > master should have connected to a second zookeeper node. > {code:title=log output} > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138 > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129 > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x154dfc0426b0054, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x254c701f28d0053, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost > leadership > 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master > shutting down. }} > {code} > spark-env.sh: > {code:title=spark-env.sh} > export SPARK_LOCAL_DIRS=/ephemeral/spark/local > export
[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit
[ https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1645#comment-1645 ] Ashwin Agate edited comment on SPARK-15544 at 4/30/18 7:05 PM: --- Can we please increase the priority of this bug since it exists in latest Spark 2.3.0 too? We have observed this during upgrade scenario (with Spark 1.6.3), where we have to shutdown zookeeper, which has the adverse side-effect of spark master shutting down on other nodes which is not very ideal. {code:java} Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: spark-box1:40588 got disassociated, removing it. Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: spark-box1:7078 got disassociated, removing it. Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Removing worker worker-20180427105900-spark-box1-7078 on 19 Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Telling app of lost executor: 2 Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: Unable to read additional data from server sessionid 0x1630 Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: Unable to read additional data from server sessionid 0x1630 Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ConnectionStateManager: State change: SUSPENDED Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ZooKeeperLeaderElectionAgent: We have lost leadership Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 ERROR Master: Leadership has been revoked – master shutting down.{code} was (Author: agateaaa): {code:java} Can we please increase the priority of this bug since it exists in latest Spark 2.3.0 too? We have observed this during upgrade scenario (with Spark 1.6.3), where we have to shutdown zookeeper, which has the adverse side-effect of spark master shutting down on other nodes which is not very ideal. Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: spark-box1:40588 got disassociated, removing it. Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: spark-box1:7078 got disassociated, removing it. Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Removing worker worker-20180427105900-spark-box1-7078 on 19 Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Telling app of lost executor: 2 Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: Unable to read additional data from server sessionid 0x1630 Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: Unable to read additional data from server sessionid 0x1630 Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ConnectionStateManager: State change: SUSPENDED Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ZooKeeperLeaderElectionAgent: We have lost leadership Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 ERROR Master: Leadership has been revoked – master shutting down.{code} > Bouncing Zookeeper node causes Active spark master to exit > -- > > Key: SPARK-15544 > URL: https://issues.apache.org/jira/browse/SPARK-15544 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 > Environment: Ubuntu 14.04. Zookeeper 3.4.6 with 3-node quorum >Reporter: Steven Lowenthal >Priority: Major > > Shutting Down a single zookeeper node caused spark master to exit. The > master should have connected to a second zookeeper node. > {code:title=log output} > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138 > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129 > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x154dfc0426b0054, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x254c701f28d0053, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost > leadership > 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master > shutting down. }} > {code} > spark-env.sh: > {code:title=spark-env.sh} > export SPARK_LOCAL_DIRS=/ephemeral/spark/local > export
[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit
[ https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1645#comment-1645 ] Ashwin Agate edited comment on SPARK-15544 at 4/30/18 7:05 PM: --- Can we please increase the priority of this bug since it exists in latest Spark 2.3.0 too? We have observed this during upgrade scenario (with Spark 1.6.3), where we have to shutdown zookeeper, which has the adverse side-effect of spark master shutting down on other nodes which is not very ideal. Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: spark-box1:40588 got disassociated, removing it. Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: spark-box1:7078 got disassociated, removing it. Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Removing worker worker-20180427105900-spark-box1-7078 on 19 Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Telling app of lost executor: 2 Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: Unable to read additional data from server sessionid 0x1630 Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: Unable to read additional data from server sessionid 0x1630 Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ConnectionStateManager: State change: SUSPENDED Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ZooKeeperLeaderElectionAgent: We have lost leadership Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 ERROR Master: Leadership has been revoked -- master shutting down. {code:java} {code} was (Author: agateaaa): Can we please increase the priority of this bug since it exists in latest Spark 2.3.0 too? We have observed this during upgrade scenario (with Spark 1.6.3), where we have to shutdown zookeeper, which has the adverse side-effect of spark master shutting down on other nodes which is not very ideal. > Bouncing Zookeeper node causes Active spark master to exit > -- > > Key: SPARK-15544 > URL: https://issues.apache.org/jira/browse/SPARK-15544 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 > Environment: Ubuntu 14.04. Zookeeper 3.4.6 with 3-node quorum >Reporter: Steven Lowenthal >Priority: Major > > Shutting Down a single zookeeper node caused spark master to exit. The > master should have connected to a second zookeeper node. > {code:title=log output} > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138 > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129 > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x154dfc0426b0054, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x254c701f28d0053, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost > leadership > 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master > shutting down. }} > {code} > spark-env.sh: > {code:title=spark-env.sh} > export SPARK_LOCAL_DIRS=/ephemeral/spark/local > export SPARK_WORKER_DIR=/ephemeral/spark/work > export SPARK_LOG_DIR=/var/log/spark > export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop > export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER > -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181" > export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true" > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit
[ https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1645#comment-1645 ] agate commented on SPARK-15544: --- Can we please increase the priority of this bug since it exists in latest Spark 2.3.0 too? We have observed this during upgrade scenario (with Spark 1.6.3), where we have to shutdown zookeeper, which has the adverse side-effect of spark master shutting down on other nodes which is not very ideal. > Bouncing Zookeeper node causes Active spark master to exit > -- > > Key: SPARK-15544 > URL: https://issues.apache.org/jira/browse/SPARK-15544 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 > Environment: Ubuntu 14.04. Zookeeper 3.4.6 with 3-node quorum >Reporter: Steven Lowenthal >Priority: Major > > Shutting Down a single zookeeper node caused spark master to exit. The > master should have connected to a second zookeeper node. > {code:title=log output} > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138 > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129 > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x154dfc0426b0054, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x254c701f28d0053, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost > leadership > 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master > shutting down. }} > {code} > spark-env.sh: > {code:title=spark-env.sh} > export SPARK_LOCAL_DIRS=/ephemeral/spark/local > export SPARK_WORKER_DIR=/ephemeral/spark/work > export SPARK_LOG_DIR=/var/log/spark > export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop > export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER > -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181" > export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true" > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24124) Spark history server should create spark.history.store.path and set permissions properly
[ https://issues.apache.org/jira/browse/SPARK-24124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16458835#comment-16458835 ] Thomas Graves commented on SPARK-24124: --- [~vanzin] any objections to this? > Spark history server should create spark.history.store.path and set > permissions properly > > > Key: SPARK-24124 > URL: https://issues.apache.org/jira/browse/SPARK-24124 > Project: Spark > Issue Type: Story > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Priority: Major > > Current with the new spark history server you can set > spark.history.store.path to a location to store the levelDB files. Currently > the directory has to be made before it can use that path. > We should just have the history server create it and set the file permissions > on the leveldb files to be restrictive -> new FsPermission((short) 0700) > the shuffle service already does this, this would be much more convenient to > use and prevent people from making mistakes with the permissions on the > directory and files. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24124) Spark history server should create spark.history.store.path and set permissions properly
Thomas Graves created SPARK-24124: - Summary: Spark history server should create spark.history.store.path and set permissions properly Key: SPARK-24124 URL: https://issues.apache.org/jira/browse/SPARK-24124 Project: Spark Issue Type: Story Components: Spark Core Affects Versions: 2.3.0 Reporter: Thomas Graves Current with the new spark history server you can set spark.history.store.path to a location to store the levelDB files. Currently the directory has to be made before it can use that path. We should just have the history server create it and set the file permissions on the leveldb files to be restrictive -> new FsPermission((short) 0700) the shuffle service already does this, this would be much more convenient to use and prevent people from making mistakes with the permissions on the directory and files. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24123) Fix a flaky test `DateTimeUtilsSuite.monthsBetween`
Dongjoon Hyun created SPARK-24123: - Summary: Fix a flaky test `DateTimeUtilsSuite.monthsBetween` Key: SPARK-24123 URL: https://issues.apache.org/jira/browse/SPARK-24123 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.0 Reporter: Dongjoon Hyun **MASTER BRANCH** https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.6/4810/testReport/org.apache.spark.sql.catalyst.util/DateTimeUtilsSuite/monthsBetween/ {code} Error Message 3.949596773820191 did not equal 3.9495967741935485 Stacktrace org.scalatest.exceptions.TestFailedException: 3.949596773820191 did not equal 3.9495967741935485 at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:528) at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560) at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:501) at org.apache.spark.sql.catalyst.util.DateTimeUtilsSuite$$anonfun$25.apply(DateTimeUtilsSuite.scala:495) at org.apache.spark.sql.catalyst.util.DateTimeUtilsSuite$$anonfun$25.apply(DateTimeUtilsSuite.scala:488) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23993) Support DESC FORMATTED table_name column_name
[ https://issues.apache.org/jira/browse/SPARK-23993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16458822#comment-16458822 ] Sunitha Kambhampati commented on SPARK-23993: - I just tried it out in the current trunk and the command is supported. I looked at the code and there is DescribeColumnCommand and it was implemented as part of SPARK-17642[SQL] support DESC EXTENDED/FORMATTED table column commands. > Support DESC FORMATTED table_name column_name > - > > Key: SPARK-23993 > URL: https://issues.apache.org/jira/browse/SPARK-23993 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.2 >Reporter: Volodymyr Glushak >Priority: Major > > Hive and Spark both supports: > {code} > DESC FORMATTED table_name{code} > which gives table metadata. > If you want to get metadata for particular column in hive you can execute: > {code} > DESC FORMATTED table_name column_name{code} > Thos is not supported in Spark. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23975) Allow Clustering to take Arrays of Double as input features
[ https://issues.apache.org/jira/browse/SPARK-23975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16458778#comment-16458778 ] Apache Spark commented on SPARK-23975: -- User 'ludatabricks' has created a pull request for this issue: https://github.com/apache/spark/pull/21195 > Allow Clustering to take Arrays of Double as input features > --- > > Key: SPARK-23975 > URL: https://issues.apache.org/jira/browse/SPARK-23975 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.3.0 >Reporter: Lu Wang >Assignee: Lu Wang >Priority: Major > > Clustering algorithms should accept Arrays in addition to Vectors as input > features. The python interface should also be changed so that it would make > PySpark a lot easier to use. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-24072) clearly define pushed filters
[ https://issues.apache.org/jira/browse/SPARK-24072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-24072. - Resolution: Fixed Fix Version/s: 2.4.0 > clearly define pushed filters > - > > Key: SPARK-24072 > URL: https://issues.apache.org/jira/browse/SPARK-24072 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Fix For: 2.4.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24122) Allow automatic driver restarts on K8s
Oz Ben-Ami created SPARK-24122: -- Summary: Allow automatic driver restarts on K8s Key: SPARK-24122 URL: https://issues.apache.org/jira/browse/SPARK-24122 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 2.3.0 Reporter: Oz Ben-Ami [~foxish] Right now SparkSubmit creates the driver as a bare pod, rather than a managed controller like a Deployment or a StatefulSet. This means there is no way to guarantee automatic restarts, eg in case a node has an issue. Note Pod RestartPolicy does not apply if a node fails. A StatefulSet would allow us to guarantee that, and keep the ability for executors to find the driver using DNS. This is particularly helpful for long-running streaming workloads, where we currently use {{yarn.resourcemanager.am.max-attempts}} with YARN. I can confirm that Spark Streaming and Structured Streaming applications can be made to recover from such a restart, with the help of checkpointing. The executors will have to be started again by the driver, but this should not be a problem. For batch processing, we could alternatively use Kubernetes {{Job}} objects, which restart pods on failure but not success. For example, note the semantics provided by the {{kubectl run}} [command|https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#run] * {{--restart=Never}}: bare Pod * {{--restart=Always}}: Deployment * {{--restart=OnFailure}}: Job https://github.com/apache-spark-on-k8s/spark/issues/288 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24046) Rate Source doesn't gradually increase rate when rampUpTime>=RowsPerSecond
[ https://issues.apache.org/jira/browse/SPARK-24046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16458636#comment-16458636 ] Apache Spark commented on SPARK-24046: -- User 'maasg' has created a pull request for this issue: https://github.com/apache/spark/pull/21194 > Rate Source doesn't gradually increase rate when rampUpTime>=RowsPerSecond > -- > > Key: SPARK-24046 > URL: https://issues.apache.org/jira/browse/SPARK-24046 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.3.0 > Environment: Spark 2.3.0 using Spark Shell on Ubuntu 17.4 > (Environment is not important, the issue lies in the rate calculation) >Reporter: Gerard Maas >Priority: Major > Labels: RateSource > Attachments: image-2018-04-22-22-03-03-945.png, > image-2018-04-22-22-06-49-202.png > > > When using the rate source in Structured streaming, the `rampUpTime` feature > fails to gradually increase the stream rate when the `rampUpTime` option is > equal or greater than `rowsPerSecond`. > When rampUpTime >= rowsPerSecond` all batches at `time < rampUpTime` contain > 0 values. The rate jumps to `rowsPerSecond` when `time>rampUpTime`. > The following scenario, executed in the `spark-shell` demonstrates this issue: > {code:java} > // Using rampUpTime(10) > rowsPerSecond(5) > {code} > {code:java} > val stream = spark.readStream > .format("rate") > .option("rowsPerSecond", 5) > .option("rampUpTime", 10) > .load() > val query = stream.writeStream.format("console").start() > // Exiting paste mode, now interpreting. > stream: org.apache.spark.sql.DataFrame = [timestamp: timestamp, value: bigint] > query: org.apache.spark.sql.streaming.StreamingQuery = > org.apache.spark.sql.execution.streaming.StreamingQueryWrapper@cf82c58 > --- > Batch: 0 > --- > +-+-+ > |timestamp|value| > +-+-+ > +-+-+ > --- > Batch: 1 > --- > +-+-+ > |timestamp|value| > +-+-+ > +-+-+ > --- > Batch: 2 > --- > +-+-+ > |timestamp|value| > +-+-+ > +-+-+ > --- > Batch: 3 > --- > +-+-+ > |timestamp|value| > +-+-+ > +-+-+ > --- > Batch: 4 > --- > +-+-+ > |timestamp|value| > +-+-+ > +-+-+ > --- > Batch: 5 > --- > +-+-+ > |timestamp|value| > +-+-+ > +-+-+ > --- > Batch: 6 > --- > +-+-+ > |timestamp|value| > +-+-+ > +-+-+ > --- > Batch: 7 > --- > +-+-+ > |timestamp|value| > +-+-+ > +-+-+ > --- > Batch: 8 > --- > +-+-+ > |timestamp|value| > +-+-+ > +-+-+ > --- > Batch: 9 > --- > +-+-+ > |timestamp|value| > +-+-+ > +-+-+ > --- > Batch: 10 > --- > +-+-+ > |timestamp|value| > +-+-+ > +-+-+ > --- > Batch: 11 > --- > ++-+ > | timestamp|value| > ++-+ > |2018-04-22 17:08:...| 0| > |2018-04-22 17:08:...| 1| > |2018-04-22 17:08:...| 2| > |2018-04-22 17:08:...| 3| > |2018-04-22 17:08:...| 4| > ++-+ > --- > Batch: 12 > --- > ++-+ > | timestamp|value| > ++-+ > |2018-04-22 17:08:...| 5| > |2018-04-22 17:08:...| 6| > |2018-04-22 17:08:...| 7| > |2018-04-22 17:08:...| 8| > |2018-04-22 17:08:...| 9| > ++-+ > {code} > > This scenario shows rowsPerSecond == rampUpTime, which also fails > {code:java} > val stream = spark.readStream > .format("rate") > .option("rowsPerSecond", 10) > .option("rampUpTime", 10) > .load() > val query =
[jira] [Assigned] (SPARK-24121) The API for handling expression code generation in expression codegen
[ https://issues.apache.org/jira/browse/SPARK-24121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24121: Assignee: Apache Spark > The API for handling expression code generation in expression codegen > - > > Key: SPARK-24121 > URL: https://issues.apache.org/jira/browse/SPARK-24121 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Liang-Chi Hsieh >Assignee: Apache Spark >Priority: Major > > In order to achieve the replacement of expr value during expression codegen > (please see the proposal at > [https://github.com/apache/spark/pull/19813#issuecomment-354045400),] we need > an API to handle the insertion of temporary symbols for statements generated > by expressions. This API must allow us to know what statement expressions are > during codegen and to use symbols instead of actual codes when generating > java codes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24121) The API for handling expression code generation in expression codegen
[ https://issues.apache.org/jira/browse/SPARK-24121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24121: Assignee: (was: Apache Spark) > The API for handling expression code generation in expression codegen > - > > Key: SPARK-24121 > URL: https://issues.apache.org/jira/browse/SPARK-24121 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Liang-Chi Hsieh >Priority: Major > > In order to achieve the replacement of expr value during expression codegen > (please see the proposal at > [https://github.com/apache/spark/pull/19813#issuecomment-354045400),] we need > an API to handle the insertion of temporary symbols for statements generated > by expressions. This API must allow us to know what statement expressions are > during codegen and to use symbols instead of actual codes when generating > java codes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24121) The API for handling expression code generation in expression codegen
[ https://issues.apache.org/jira/browse/SPARK-24121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16458574#comment-16458574 ] Apache Spark commented on SPARK-24121: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/21193 > The API for handling expression code generation in expression codegen > - > > Key: SPARK-24121 > URL: https://issues.apache.org/jira/browse/SPARK-24121 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Liang-Chi Hsieh >Priority: Major > > In order to achieve the replacement of expr value during expression codegen > (please see the proposal at > [https://github.com/apache/spark/pull/19813#issuecomment-354045400),] we need > an API to handle the insertion of temporary symbols for statements generated > by expressions. This API must allow us to know what statement expressions are > during codegen and to use symbols instead of actual codes when generating > java codes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24121) The API for handling expression code generation in expression codegen
Liang-Chi Hsieh created SPARK-24121: --- Summary: The API for handling expression code generation in expression codegen Key: SPARK-24121 URL: https://issues.apache.org/jira/browse/SPARK-24121 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Liang-Chi Hsieh In order to achieve the replacement of expr value during expression codegen (please see the proposal at [https://github.com/apache/spark/pull/19813#issuecomment-354045400),] we need an API to handle the insertion of temporary symbols for statements generated by expressions. This API must allow us to know what statement expressions are during codegen and to use symbols instead of actual codes when generating java codes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23549) Spark SQL unexpected behavior when comparing timestamp to date
[ https://issues.apache.org/jira/browse/SPARK-23549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16458511#comment-16458511 ] Emlyn Corrin commented on SPARK-23549: -- Will this be included in Spark 2.3.1? It only says 2.4.0 > Spark SQL unexpected behavior when comparing timestamp to date > -- > > Key: SPARK-23549 > URL: https://issues.apache.org/jira/browse/SPARK-23549 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.3, 2.0.2, 2.1.2, 2.2.1, 2.3.0 >Reporter: Dong Jiang >Assignee: Kazuaki Ishizaki >Priority: Major > Fix For: 2.4.0 > > > {code:java} > scala> spark.version > res1: String = 2.2.1 > scala> spark.sql("select cast('2017-03-01 00:00:00' as timestamp) between > cast('2017-02-28' as date) and cast('2017-03-01' as date)").show > +---+ > |((CAST(CAST(2017-03-01 00:00:00 AS TIMESTAMP) AS STRING) >= > CAST(CAST(2017-02-28 AS DATE) AS STRING)) AND (CAST(CAST(2017-03-01 00:00:00 > AS TIMESTAMP) AS STRING) <= CAST(CAST(2017-03-01 AS DATE) AS STRING)))| > +---+ > | > > false| > +---+{code} > As shown above, when a timestamp is compared to date in SparkSQL, both > timestamp and date are downcast to string, and leading to unexpected result. > If run the same SQL in presto/Athena, I got the expected result > {code:java} > select cast('2017-03-01 00:00:00' as timestamp) between cast('2017-02-28' as > date) and cast('2017-03-01' as date) > _col0 > 1 true > {code} > Is this a bug for Spark or a feature? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-23840) PySpark error when converting a DataFrame to rdd
[ https://issues.apache.org/jira/browse/SPARK-23840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-23840. -- Resolution: Invalid I am leaving this resolved. Sounds there's no way to go further without more information. > PySpark error when converting a DataFrame to rdd > > > Key: SPARK-23840 > URL: https://issues.apache.org/jira/browse/SPARK-23840 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.3.0 >Reporter: Uri Goren >Priority: Major > > I am running code in the `pyspark` shell on an `emr` cluster, and > encountering an error I have never seen before... > This line works: > spark.read.parquet(s3_input).take(99) > While this line causes an exception: > spark.read.parquet(s3_input).rdd.take(99) > With > > TypeError: 'int' object is not iterable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org