[jira] [Commented] (SPARK-24131) Add majorMinorVersion API to PySpark for determining Spark versions

2018-04-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16459427#comment-16459427
 ] 

Apache Spark commented on SPARK-24131:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/21203

> Add majorMinorVersion API to PySpark for determining Spark versions
> ---
>
> Key: SPARK-24131
> URL: https://issues.apache.org/jira/browse/SPARK-24131
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.4.0
>Reporter: Liang-Chi Hsieh
>Priority: Minor
>
> We need to determine Spark major and minor versions in PySpark. We can add a  
> {{majorMinorVersion}} API to PySpark which is similar to the API in 
> {{VersionUtils.majorMinorVersion}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24131) Add majorMinorVersion API to PySpark for determining Spark versions

2018-04-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24131:


Assignee: Apache Spark

> Add majorMinorVersion API to PySpark for determining Spark versions
> ---
>
> Key: SPARK-24131
> URL: https://issues.apache.org/jira/browse/SPARK-24131
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.4.0
>Reporter: Liang-Chi Hsieh
>Assignee: Apache Spark
>Priority: Minor
>
> We need to determine Spark major and minor versions in PySpark. We can add a  
> {{majorMinorVersion}} API to PySpark which is similar to the API in 
> {{VersionUtils.majorMinorVersion}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24131) Add majorMinorVersion API to PySpark for determining Spark versions

2018-04-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24131:


Assignee: (was: Apache Spark)

> Add majorMinorVersion API to PySpark for determining Spark versions
> ---
>
> Key: SPARK-24131
> URL: https://issues.apache.org/jira/browse/SPARK-24131
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.4.0
>Reporter: Liang-Chi Hsieh
>Priority: Minor
>
> We need to determine Spark major and minor versions in PySpark. We can add a  
> {{majorMinorVersion}} API to PySpark which is similar to the API in 
> {{VersionUtils.majorMinorVersion}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24131) Add majorMinorVersion API to PySpark for determining Spark versions

2018-04-30 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-24131:
---

 Summary: Add majorMinorVersion API to PySpark for determining 
Spark versions
 Key: SPARK-24131
 URL: https://issues.apache.org/jira/browse/SPARK-24131
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 2.4.0
Reporter: Liang-Chi Hsieh


We need to determine Spark major and minor versions in PySpark. We can add a  
{{majorMinorVersion}} API to PySpark which is similar to the API in 
{{VersionUtils.majorMinorVersion}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24130) Data Source V2: Join Push Down

2018-04-30 Thread Jia Li (JIRA)
Jia Li created SPARK-24130:
--

 Summary: Data Source V2: Join Push Down
 Key: SPARK-24130
 URL: https://issues.apache.org/jira/browse/SPARK-24130
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.3.0
Reporter: Jia Li


Spark applications often directly query external data sources such as 
relational databases, or files. Spark provides Data Sources APIs for accessing 
structured data through Spark SQL. Data Sources APIs in both V1 and V2 support 
optimizations such as Filter push down and Column pruning which are subset of 
the functionality that can be pushed down to some data sources. We’re proposing 
to extend Data Sources APIs with join push down (JPD). Join push down 
significantly improves query performance by reducing the amount of data 
transfer and exploiting the capabilities of the data sources such as index 
access.

Join push down design document is available 
[here|https://docs.google.com/document/d/1k-kRadTcUbxVfUQwqBbIXs_yPZMxh18-e-cz77O_TaE/edit?usp=sharing].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23853) Skip doctests which require hive support built in PySpark

2018-04-30 Thread Hyukjin Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-23853.
--
   Resolution: Fixed
Fix Version/s: 2.3.1
   2.4.0

Issue resolved by pull request 21141
[https://github.com/apache/spark/pull/21141]

> Skip doctests which require hive support built in PySpark
> -
>
> Key: SPARK-23853
> URL: https://issues.apache.org/jira/browse/SPARK-23853
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.4.0
>Reporter: holdenk
>Assignee: Dongjoon Hyun
>Priority: Trivial
> Fix For: 2.4.0, 2.3.1
>
>
> As we do with detecting if various libraries are installed if there is no 
> support built in we should skip the tests which require hive.
> e.g. the readwrite doctest.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23853) Skip doctests which require hive support built in PySpark

2018-04-30 Thread Hyukjin Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-23853:


Assignee: Dongjoon Hyun

> Skip doctests which require hive support built in PySpark
> -
>
> Key: SPARK-23853
> URL: https://issues.apache.org/jira/browse/SPARK-23853
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.4.0
>Reporter: holdenk
>Assignee: Dongjoon Hyun
>Priority: Trivial
> Fix For: 2.3.1, 2.4.0
>
>
> As we do with detecting if various libraries are installed if there is no 
> support built in we should skip the tests which require hive.
> e.g. the readwrite doctest.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24129) Add option to pass --build-arg's to docker-image-tool.sh

2018-04-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24129:


Assignee: (was: Apache Spark)

> Add option to pass --build-arg's to docker-image-tool.sh
> 
>
> Key: SPARK-24129
> URL: https://issues.apache.org/jira/browse/SPARK-24129
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Devaraj K
>Priority: Minor
>
> When we are working behind the firewall, we may need to pass the proxy 
> details as part of the docker --build-arg parameters to build the image. But 
> docker-image-tool.sh doesn't provide option to pass the proxy details or the 
> --build-arg to the docker command.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24129) Add option to pass --build-arg's to docker-image-tool.sh

2018-04-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16459349#comment-16459349
 ] 

Apache Spark commented on SPARK-24129:
--

User 'devaraj-kavali' has created a pull request for this issue:
https://github.com/apache/spark/pull/21202

> Add option to pass --build-arg's to docker-image-tool.sh
> 
>
> Key: SPARK-24129
> URL: https://issues.apache.org/jira/browse/SPARK-24129
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Devaraj K
>Priority: Minor
>
> When we are working behind the firewall, we may need to pass the proxy 
> details as part of the docker --build-arg parameters to build the image. But 
> docker-image-tool.sh doesn't provide option to pass the proxy details or the 
> --build-arg to the docker command.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24129) Add option to pass --build-arg's to docker-image-tool.sh

2018-04-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24129:


Assignee: Apache Spark

> Add option to pass --build-arg's to docker-image-tool.sh
> 
>
> Key: SPARK-24129
> URL: https://issues.apache.org/jira/browse/SPARK-24129
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Devaraj K
>Assignee: Apache Spark
>Priority: Minor
>
> When we are working behind the firewall, we may need to pass the proxy 
> details as part of the docker --build-arg parameters to build the image. But 
> docker-image-tool.sh doesn't provide option to pass the proxy details or the 
> --build-arg to the docker command.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23940) High-order function: transform_values(map<K, V1>, function<K, V1, V2>) → map<K, V2>

2018-04-30 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated SPARK-23940:
---
Summary: High-order function: transform_values(map, function) → map  (was: High-ofer function: transform_values(map, 
function) → map)

> High-order function: transform_values(map, function) → 
> map
> ---
>
> Key: SPARK-23940
> URL: https://issues.apache.org/jira/browse/SPARK-23940
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Priority: Major
>
> Ref: https://prestodb.io/docs/current/functions/map.html
> Returns a map that applies function to each entry of map and transforms the 
> values.
> {noformat}
> SELECT transform_values(MAP(ARRAY[], ARRAY[]), (k, v) -> v + 1); -- {}
> SELECT transform_values(MAP(ARRAY [1, 2, 3], ARRAY [10, 20, 30]), (k, v) -> v 
> + k); -- {1 -> 11, 2 -> 22, 3 -> 33}
> SELECT transform_values(MAP(ARRAY [1, 2, 3], ARRAY ['a', 'b', 'c']), (k, v) 
> -> k * k); -- {1 -> 1, 2 -> 4, 3 -> 9}
> SELECT transform_values(MAP(ARRAY ['a', 'b'], ARRAY [1, 2]), (k, v) -> k || 
> CAST(v as VARCHAR)); -- {a -> a1, b -> b2}
> SELECT transform_values(MAP(ARRAY [1, 2], ARRAY [1.0, 1.4]), -- {1 -> 
> one_1.0, 2 -> two_1.4}
> (k, v) -> MAP(ARRAY[1, 2], ARRAY['one', 'two'])[k] || 
> '_' || CAST(v AS VARCHAR));
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24128) Mention spark.sql.crossJoin.enabled in implicit cartesian product error msg

2018-04-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16459333#comment-16459333
 ] 

Apache Spark commented on SPARK-24128:
--

User 'henryr' has created a pull request for this issue:
https://github.com/apache/spark/pull/21201

> Mention spark.sql.crossJoin.enabled in implicit cartesian product error msg
> ---
>
> Key: SPARK-24128
> URL: https://issues.apache.org/jira/browse/SPARK-24128
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Henry Robinson
>Priority: Minor
>
> The error message given when a query contains an implicit cartesian product 
> suggests rewriting the query using {{CROSS JOIN}}, but not disabling the 
> check using {{spark.sql.crossJoin.enabled=true}}. It's sometimes easier to 
> change a config variable than edit a query, so it would be helpful to make 
> the user aware of their options. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24128) Mention spark.sql.crossJoin.enabled in implicit cartesian product error msg

2018-04-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24128:


Assignee: (was: Apache Spark)

> Mention spark.sql.crossJoin.enabled in implicit cartesian product error msg
> ---
>
> Key: SPARK-24128
> URL: https://issues.apache.org/jira/browse/SPARK-24128
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Henry Robinson
>Priority: Minor
>
> The error message given when a query contains an implicit cartesian product 
> suggests rewriting the query using {{CROSS JOIN}}, but not disabling the 
> check using {{spark.sql.crossJoin.enabled=true}}. It's sometimes easier to 
> change a config variable than edit a query, so it would be helpful to make 
> the user aware of their options. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24128) Mention spark.sql.crossJoin.enabled in implicit cartesian product error msg

2018-04-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24128:


Assignee: Apache Spark

> Mention spark.sql.crossJoin.enabled in implicit cartesian product error msg
> ---
>
> Key: SPARK-24128
> URL: https://issues.apache.org/jira/browse/SPARK-24128
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Henry Robinson
>Assignee: Apache Spark
>Priority: Minor
>
> The error message given when a query contains an implicit cartesian product 
> suggests rewriting the query using {{CROSS JOIN}}, but not disabling the 
> check using {{spark.sql.crossJoin.enabled=true}}. It's sometimes easier to 
> change a config variable than edit a query, so it would be helpful to make 
> the user aware of their options. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24129) Add option to pass --build-arg's to docker-image-tool.sh

2018-04-30 Thread Devaraj K (JIRA)
Devaraj K created SPARK-24129:
-

 Summary: Add option to pass --build-arg's to docker-image-tool.sh
 Key: SPARK-24129
 URL: https://issues.apache.org/jira/browse/SPARK-24129
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 2.4.0
Reporter: Devaraj K


When we are working behind the firewall, we may need to pass the proxy details 
as part of the docker --build-arg parameters to build the image. But 
docker-image-tool.sh doesn't provide option to pass the proxy details or the 
--build-arg to the docker command.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24128) Mention spark.sql.crossJoin.enabled in implicit cartesian product error msg

2018-04-30 Thread Henry Robinson (JIRA)
Henry Robinson created SPARK-24128:
--

 Summary: Mention spark.sql.crossJoin.enabled in implicit cartesian 
product error msg
 Key: SPARK-24128
 URL: https://issues.apache.org/jira/browse/SPARK-24128
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Henry Robinson


The error message given when a query contains an implicit cartesian product 
suggests rewriting the query using {{CROSS JOIN}}, but not disabling the check 
using {{spark.sql.crossJoin.enabled=true}}. It's sometimes easier to change a 
config variable than edit a query, so it would be helpful to make the user 
aware of their options. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24039) remove restarting iterators hack

2018-04-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24039:


Assignee: Apache Spark

> remove restarting iterators hack
> 
>
> Key: SPARK-24039
> URL: https://issues.apache.org/jira/browse/SPARK-24039
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Jose Torres
>Assignee: Apache Spark
>Priority: Major
>
> Currently, continuous processing execution calls next() to restart the query 
> iterator after it returns false. This doesn't work for complex RDDs - we need 
> to call compute() instead.
> This isn't refactoring-only; changes will be required to keep the reader from 
> starting over in each compute() call.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24039) remove restarting iterators hack

2018-04-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16459208#comment-16459208
 ] 

Apache Spark commented on SPARK-24039:
--

User 'jose-torres' has created a pull request for this issue:
https://github.com/apache/spark/pull/21200

> remove restarting iterators hack
> 
>
> Key: SPARK-24039
> URL: https://issues.apache.org/jira/browse/SPARK-24039
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Jose Torres
>Priority: Major
>
> Currently, continuous processing execution calls next() to restart the query 
> iterator after it returns false. This doesn't work for complex RDDs - we need 
> to call compute() instead.
> This isn't refactoring-only; changes will be required to keep the reader from 
> starting over in each compute() call.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24039) remove restarting iterators hack

2018-04-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24039:


Assignee: (was: Apache Spark)

> remove restarting iterators hack
> 
>
> Key: SPARK-24039
> URL: https://issues.apache.org/jira/browse/SPARK-24039
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Jose Torres
>Priority: Major
>
> Currently, continuous processing execution calls next() to restart the query 
> iterator after it returns false. This doesn't work for complex RDDs - we need 
> to call compute() instead.
> This isn't refactoring-only; changes will be required to keep the reader from 
> starting over in each compute() call.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23549) Spark SQL unexpected behavior when comparing timestamp to date

2018-04-30 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16459136#comment-16459136
 ] 

Dongjoon Hyun commented on SPARK-23549:
---

[~emlyn]. This issue changes the behavior and introduces new conf 
`spark.sql.hive.compareDateTimestampInTimestamp`. I don't think this will be 
included in Spark 2.3.1.

> Spark SQL unexpected behavior when comparing timestamp to date
> --
>
> Key: SPARK-23549
> URL: https://issues.apache.org/jira/browse/SPARK-23549
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.3, 2.0.2, 2.1.2, 2.2.1, 2.3.0
>Reporter: Dong Jiang
>Assignee: Kazuaki Ishizaki
>Priority: Major
> Fix For: 2.4.0
>
>
> {code:java}
> scala> spark.version
> res1: String = 2.2.1
> scala> spark.sql("select cast('2017-03-01 00:00:00' as timestamp) between 
> cast('2017-02-28' as date) and cast('2017-03-01' as date)").show
> +---+
> |((CAST(CAST(2017-03-01 00:00:00 AS TIMESTAMP) AS STRING) >= 
> CAST(CAST(2017-02-28 AS DATE) AS STRING)) AND (CAST(CAST(2017-03-01 00:00:00 
> AS TIMESTAMP) AS STRING) <= CAST(CAST(2017-03-01 AS DATE) AS STRING)))|
> +---+
> |                                                                             
>                                                                               
>                                                false|
> +---+{code}
> As shown above, when a timestamp is compared to date in SparkSQL, both 
> timestamp and date are downcast to string, and leading to unexpected result. 
> If run the same SQL in presto/Athena, I got the expected result
> {code:java}
> select cast('2017-03-01 00:00:00' as timestamp) between cast('2017-02-28' as 
> date) and cast('2017-03-01' as date)
>   _col0
> 1 true
> {code}
> Is this a bug for Spark or a feature?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24127) Support text socket source in continuous mode

2018-04-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24127:


Assignee: (was: Apache Spark)

> Support text socket source in continuous mode
> -
>
> Key: SPARK-24127
> URL: https://issues.apache.org/jira/browse/SPARK-24127
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Arun Mahadevan
>Priority: Minor
>
> Currently the text socket source is supported for structured streaming micro 
> batch mode.
> Supporting it in continuous mode enables running structured streaming 
> continuous pipelines where one can ingest data via "nc" and run examples.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24127) Support text socket source in continuous mode

2018-04-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16459129#comment-16459129
 ] 

Apache Spark commented on SPARK-24127:
--

User 'arunmahadevan' has created a pull request for this issue:
https://github.com/apache/spark/pull/21199

> Support text socket source in continuous mode
> -
>
> Key: SPARK-24127
> URL: https://issues.apache.org/jira/browse/SPARK-24127
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Arun Mahadevan
>Priority: Minor
>
> Currently the text socket source is supported for structured streaming micro 
> batch mode.
> Supporting it in continuous mode enables running structured streaming 
> continuous pipelines where one can ingest data via "nc" and run examples.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24127) Support text socket source in continuous mode

2018-04-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24127:


Assignee: Apache Spark

> Support text socket source in continuous mode
> -
>
> Key: SPARK-24127
> URL: https://issues.apache.org/jira/browse/SPARK-24127
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Arun Mahadevan
>Assignee: Apache Spark
>Priority: Minor
>
> Currently the text socket source is supported for structured streaming micro 
> batch mode.
> Supporting it in continuous mode enables running structured streaming 
> continuous pipelines where one can ingest data via "nc" and run examples.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24127) Support text socket source in continuous mode

2018-04-30 Thread Arun Mahadevan (JIRA)
Arun Mahadevan created SPARK-24127:
--

 Summary: Support text socket source in continuous mode
 Key: SPARK-24127
 URL: https://issues.apache.org/jira/browse/SPARK-24127
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 2.4.0
Reporter: Arun Mahadevan


Currently the text socket source is supported for structured streaming micro 
batch mode.

Supporting it in continuous mode enables running structured streaming 
continuous pipelines where one can ingest data via "nc" and run examples.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24115) improve instrumentation for spark.ml.tuning

2018-04-30 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16459033#comment-16459033
 ] 

Joseph K. Bradley commented on SPARK-24115:
---

Sounds good; go ahead.

> improve instrumentation for spark.ml.tuning
> ---
>
> Key: SPARK-24115
> URL: https://issues.apache.org/jira/browse/SPARK-24115
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 2.3.0
>Reporter: yogesh garg
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24003) Add support to provide spark.executor.extraJavaOptions in terms of App Id and/or Executor Id's

2018-04-30 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-24003:
--

Assignee: Devaraj K

> Add support to provide spark.executor.extraJavaOptions in terms of App Id 
> and/or Executor Id's
> --
>
> Key: SPARK-24003
> URL: https://issues.apache.org/jira/browse/SPARK-24003
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos, Spark Core, YARN
>Affects Versions: 2.3.0
>Reporter: Devaraj K
>Assignee: Devaraj K
>Priority: Major
> Fix For: 2.4.0
>
>
> Users may want to enable gc logging or heap dump for the executors, but there 
> is a chance of overwriting it by other executors since the paths cannot be 
> expressed dynamically. This improvement would enable to express the 
> spark.executor.extraJavaOptions paths in terms of App Id and Executor Id's to 
> avoid the overwriting by other executors.
> There was a discussion about this in SPARK-3767, but it never fixed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-24003) Add support to provide spark.executor.extraJavaOptions in terms of App Id and/or Executor Id's

2018-04-30 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-24003.

   Resolution: Fixed
Fix Version/s: 2.4.0

Issue resolved by pull request 21088
[https://github.com/apache/spark/pull/21088]

> Add support to provide spark.executor.extraJavaOptions in terms of App Id 
> and/or Executor Id's
> --
>
> Key: SPARK-24003
> URL: https://issues.apache.org/jira/browse/SPARK-24003
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos, Spark Core, YARN
>Affects Versions: 2.3.0
>Reporter: Devaraj K
>Assignee: Devaraj K
>Priority: Major
> Fix For: 2.4.0
>
>
> Users may want to enable gc logging or heap dump for the executors, but there 
> is a chance of overwriting it by other executors since the paths cannot be 
> expressed dynamically. This improvement would enable to express the 
> spark.executor.extraJavaOptions paths in terms of App Id and Executor Id's to 
> avoid the overwriting by other executors.
> There was a discussion about this in SPARK-3767, but it never fixed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23781) Merge YARN and Mesos token renewal code

2018-04-30 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated SPARK-23781:
---
Component/s: (was: yan)
 YARN

> Merge YARN and Mesos token renewal code
> ---
>
> Key: SPARK-23781
> URL: https://issues.apache.org/jira/browse/SPARK-23781
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos, YARN
>Affects Versions: 2.4.0
>Reporter: Marcelo Vanzin
>Priority: Major
>
> With the fix for SPARK-23361, the code that handles delegation tokens in 
> Mesos and YARN ends up being very similar.
> We shouyld refactor that code so that both backends are sharing the same 
> code, which also would make it easier for other cluster managers to use that 
> code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24126) PySpark tests leave a lot of garbage in /tmp

2018-04-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16459000#comment-16459000
 ] 

Apache Spark commented on SPARK-24126:
--

User 'vanzin' has created a pull request for this issue:
https://github.com/apache/spark/pull/21198

> PySpark tests leave a lot of garbage in /tmp
> 
>
> Key: SPARK-24126
> URL: https://issues.apache.org/jira/browse/SPARK-24126
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Tests
>Affects Versions: 2.4.0
>Reporter: Marcelo Vanzin
>Priority: Minor
>
> When you run pyspark tests, they leave a lot of garbage in /tmp. The test 
> code should do a better job at cleaning up after itself, and also try to keep 
> things under the build directory so that things like "mvn clean" or "git 
> clean" can do their thing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24126) PySpark tests leave a lot of garbage in /tmp

2018-04-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24126:


Assignee: Apache Spark

> PySpark tests leave a lot of garbage in /tmp
> 
>
> Key: SPARK-24126
> URL: https://issues.apache.org/jira/browse/SPARK-24126
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Tests
>Affects Versions: 2.4.0
>Reporter: Marcelo Vanzin
>Assignee: Apache Spark
>Priority: Minor
>
> When you run pyspark tests, they leave a lot of garbage in /tmp. The test 
> code should do a better job at cleaning up after itself, and also try to keep 
> things under the build directory so that things like "mvn clean" or "git 
> clean" can do their thing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24126) PySpark tests leave a lot of garbage in /tmp

2018-04-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24126:


Assignee: (was: Apache Spark)

> PySpark tests leave a lot of garbage in /tmp
> 
>
> Key: SPARK-24126
> URL: https://issues.apache.org/jira/browse/SPARK-24126
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Tests
>Affects Versions: 2.4.0
>Reporter: Marcelo Vanzin
>Priority: Minor
>
> When you run pyspark tests, they leave a lot of garbage in /tmp. The test 
> code should do a better job at cleaning up after itself, and also try to keep 
> things under the build directory so that things like "mvn clean" or "git 
> clean" can do their thing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24126) PySpark tests leave a lot of garbage in /tmp

2018-04-30 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16458974#comment-16458974
 ] 

Marcelo Vanzin commented on SPARK-24126:


BTW I'm testing changes to implement this.

> PySpark tests leave a lot of garbage in /tmp
> 
>
> Key: SPARK-24126
> URL: https://issues.apache.org/jira/browse/SPARK-24126
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Tests
>Affects Versions: 2.4.0
>Reporter: Marcelo Vanzin
>Priority: Minor
>
> When you run pyspark tests, they leave a lot of garbage in /tmp. The test 
> code should do a better job at cleaning up after itself, and also try to keep 
> things under the build directory so that things like "mvn clean" or "git 
> clean" can do their thing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24126) PySpark tests leave a lot of garbage in /tmp

2018-04-30 Thread Marcelo Vanzin (JIRA)
Marcelo Vanzin created SPARK-24126:
--

 Summary: PySpark tests leave a lot of garbage in /tmp
 Key: SPARK-24126
 URL: https://issues.apache.org/jira/browse/SPARK-24126
 Project: Spark
  Issue Type: Improvement
  Components: PySpark, Tests
Affects Versions: 2.4.0
Reporter: Marcelo Vanzin


When you run pyspark tests, they leave a lot of garbage in /tmp. The test code 
should do a better job at cleaning up after itself, and also try to keep things 
under the build directory so that things like "mvn clean" or "git clean" can do 
their thing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24125) Add quoting rules to SQL guide

2018-04-30 Thread Henry Robinson (JIRA)
Henry Robinson created SPARK-24125:
--

 Summary: Add quoting rules to SQL guide
 Key: SPARK-24125
 URL: https://issues.apache.org/jira/browse/SPARK-24125
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Henry Robinson


As far as I can tell, Spark SQL's quoting rules are as follows:

* {{`foo bar`}} is an identifier
* {{'foo bar'}} is a string literal
* {{"foo bar"}} is a string literal

The last of these is non-standard (usually {{"foo bar"}} is an identifier), and 
so it's probably worth mentioning these rules in the 'reference' section of the 
[SQL 
guide|http://spark.apache.org/docs/latest/sql-programming-guide.html#reference].

I'm assuming there's not a lot of enthusiasm to change the quoting rules, given 
it would be a breaking change, and that backticks work just fine as an 
alternative. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23971) Should not leak Spark sessions across test suites

2018-04-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16458959#comment-16458959
 ] 

Apache Spark commented on SPARK-23971:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/21197

> Should not leak Spark sessions across test suites
> -
>
> Key: SPARK-23971
> URL: https://issues.apache.org/jira/browse/SPARK-23971
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Eric Liang
>Assignee: Eric Liang
>Priority: Major
> Fix For: 2.4.0
>
>
> Many suites currently leak Spark sessions (sometimes with stopped 
> SparkContexts) via the thread-local active Spark session and default Spark 
> session. We should attempt to clean these up and detect when this happens to 
> improve the reproducibility of tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24123) Fix a flaky test `DateTimeUtilsSuite.monthsBetween`

2018-04-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24123:


Assignee: (was: Apache Spark)

> Fix a flaky test `DateTimeUtilsSuite.monthsBetween`
> ---
>
> Key: SPARK-24123
> URL: https://issues.apache.org/jira/browse/SPARK-24123
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>
> **MASTER BRANCH**
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.6/4810/testReport/org.apache.spark.sql.catalyst.util/DateTimeUtilsSuite/monthsBetween/
> {code}
> Error Message
> 3.949596773820191 did not equal 3.9495967741935485
> Stacktrace
>   org.scalatest.exceptions.TestFailedException: 3.949596773820191 did not 
> equal 3.9495967741935485
>   at 
> org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:528)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:501)
>   at 
> org.apache.spark.sql.catalyst.util.DateTimeUtilsSuite$$anonfun$25.apply(DateTimeUtilsSuite.scala:495)
>   at 
> org.apache.spark.sql.catalyst.util.DateTimeUtilsSuite$$anonfun$25.apply(DateTimeUtilsSuite.scala:488)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24123) Fix a flaky test `DateTimeUtilsSuite.monthsBetween`

2018-04-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16458907#comment-16458907
 ] 

Apache Spark commented on SPARK-24123:
--

User 'mgaido91' has created a pull request for this issue:
https://github.com/apache/spark/pull/21196

> Fix a flaky test `DateTimeUtilsSuite.monthsBetween`
> ---
>
> Key: SPARK-24123
> URL: https://issues.apache.org/jira/browse/SPARK-24123
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>
> **MASTER BRANCH**
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.6/4810/testReport/org.apache.spark.sql.catalyst.util/DateTimeUtilsSuite/monthsBetween/
> {code}
> Error Message
> 3.949596773820191 did not equal 3.9495967741935485
> Stacktrace
>   org.scalatest.exceptions.TestFailedException: 3.949596773820191 did not 
> equal 3.9495967741935485
>   at 
> org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:528)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:501)
>   at 
> org.apache.spark.sql.catalyst.util.DateTimeUtilsSuite$$anonfun$25.apply(DateTimeUtilsSuite.scala:495)
>   at 
> org.apache.spark.sql.catalyst.util.DateTimeUtilsSuite$$anonfun$25.apply(DateTimeUtilsSuite.scala:488)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24123) Fix a flaky test `DateTimeUtilsSuite.monthsBetween`

2018-04-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24123:


Assignee: Apache Spark

> Fix a flaky test `DateTimeUtilsSuite.monthsBetween`
> ---
>
> Key: SPARK-24123
> URL: https://issues.apache.org/jira/browse/SPARK-24123
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Minor
>
> **MASTER BRANCH**
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.6/4810/testReport/org.apache.spark.sql.catalyst.util/DateTimeUtilsSuite/monthsBetween/
> {code}
> Error Message
> 3.949596773820191 did not equal 3.9495967741935485
> Stacktrace
>   org.scalatest.exceptions.TestFailedException: 3.949596773820191 did not 
> equal 3.9495967741935485
>   at 
> org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:528)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:501)
>   at 
> org.apache.spark.sql.catalyst.util.DateTimeUtilsSuite$$anonfun$25.apply(DateTimeUtilsSuite.scala:495)
>   at 
> org.apache.spark.sql.catalyst.util.DateTimeUtilsSuite$$anonfun$25.apply(DateTimeUtilsSuite.scala:488)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24124) Spark history server should create spark.history.store.path and set permissions properly

2018-04-30 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16458904#comment-16458904
 ] 

Marcelo Vanzin commented on SPARK-24124:


This should be fine.

> Spark history server should create spark.history.store.path and set 
> permissions properly
> 
>
> Key: SPARK-24124
> URL: https://issues.apache.org/jira/browse/SPARK-24124
> Project: Spark
>  Issue Type: Story
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Thomas Graves
>Priority: Major
>
> Current with the new spark history server you can set 
> spark.history.store.path to a location to store the levelDB files.  Currently 
> the directory has to be made before it can use that path.
> We should just have the history server create it and set the file permissions 
> on the leveldb files to be restrictive -> new FsPermission((short) 0700)
> the shuffle service already does this, this would be much more convenient to 
> use and prevent people from making mistakes with the permissions on the 
> directory and files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23530) It's not appropriate to let the original master exit while the leader of zookeeper shutdown

2018-04-30 Thread Ashwin Agate (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16458895#comment-16458895
 ] 

Ashwin Agate commented on SPARK-23530:
--

 Can we please increase the priority of this bug since it exists in latest 
Spark 2.3.0 too?  We have observed this during upgrade scenario (with Spark 
1.6.3), where we have to shutdown zookeeper, which has the adverse side-effect 
of spark master shutting down on other nodes which is not very ideal.

BTW https://issues.apache.org/jira/browse/SPARK-15544 is the similar issue 
which was filed for Spark 1.6.1

> It's not appropriate to let the original master exit while the leader of 
> zookeeper shutdown
> ---
>
> Key: SPARK-23530
> URL: https://issues.apache.org/jira/browse/SPARK-23530
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.1, 2.3.0
>Reporter: liuxianjiao
>Priority: Critical
>
> When the leader of zookeeper shutdown,the current method of spark is letting 
> the master exit to revoke the leadership.However,this sacrifice a master 
> node.According the treatment of hadoop and storm ,we should let the origin 
> active master to be standby ,or Re-election for spark master,or any other 
> ways to revoke leadership gracefully.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit

2018-04-30 Thread Ashwin Agate (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1645#comment-1645
 ] 

Ashwin Agate edited comment on SPARK-15544 at 4/30/18 7:05 PM:
---

 
{code:java}
 Can we please increase the priority of this bug since it exists in latest 
Spark 2.3.0 too?  We have observed this during upgrade scenario (with Spark 
1.6.3), where we have to shutdown zookeeper, which has the adverse side-effect 
of spark master shutting down on other nodes which is not very ideal.

 Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: 
spark-box1:40588 got disassociated, removing it.
Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: 
spark-box1:7078 got disassociated, removing it.
Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Removing 
worker worker-20180427105900-spark-box1-7078 on 19
Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Telling 
app of lost executor: 2
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: 
Unable to read additional data from server sessionid 0x1630
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: 
Unable to read additional data from server sessionid 0x1630
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO 
ConnectionStateManager: State change: SUSPENDED
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO 
ZooKeeperLeaderElectionAgent: We have lost leadership
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 ERROR Master: 
Leadership has been revoked – master shutting down.{code}


was (Author: agateaaa):
Can we please increase the priority of this bug since it exists in latest Spark 
2.3.0 too?  We have observed this during upgrade scenario (with Spark 1.6.3), 
where we have to shutdown zookeeper, which has the adverse side-effect of spark 
master shutting down on other nodes which is not very ideal.

 Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: 
spark-box1:40588 got disassociated, removing it.
Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: 
spark-box1:7078 got disassociated, removing it.
Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Removing 
worker worker-20180427105900-spark-box1-7078 on 19
Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Telling 
app of lost executor: 2
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: 
Unable to read additional data from server sessionid 0x1630
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: 
Unable to read additional data from server sessionid 0x1630
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO 
ConnectionStateManager: State change: SUSPENDED
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO 
ZooKeeperLeaderElectionAgent: We have lost leadership
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 ERROR Master: 
Leadership has been revoked -- master shutting down.
{code:java}
 {code}

> Bouncing Zookeeper node causes Active spark master to exit
> --
>
> Key: SPARK-15544
> URL: https://issues.apache.org/jira/browse/SPARK-15544
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
> Environment: Ubuntu 14.04.  Zookeeper 3.4.6 with 3-node quorum
>Reporter: Steven Lowenthal
>Priority: Major
>
> Shutting Down a single zookeeper node caused spark master to exit.  The 
> master should have connected to a second zookeeper node. 
> {code:title=log output}
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x154dfc0426b0054, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x254c701f28d0053, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost 
> leadership
> 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master 
> shutting down. }}
> {code}
> spark-env.sh: 
> {code:title=spark-env.sh}
> export SPARK_LOCAL_DIRS=/ephemeral/spark/local
> export 

[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit

2018-04-30 Thread Ashwin Agate (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1645#comment-1645
 ] 

Ashwin Agate edited comment on SPARK-15544 at 4/30/18 7:05 PM:
---

  Can we please increase the priority of this bug since it exists in latest 
Spark 2.3.0 too?  We have observed this during upgrade scenario (with Spark 
1.6.3), where we have to shutdown zookeeper, which has the adverse side-effect 
of spark master shutting down on other nodes which is not very ideal.
{code:java}
 Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: 
spark-box1:40588 got disassociated, removing it.
Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: 
spark-box1:7078 got disassociated, removing it.
Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Removing 
worker worker-20180427105900-spark-box1-7078 on 19
Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Telling 
app of lost executor: 2
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: 
Unable to read additional data from server sessionid 0x1630
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: 
Unable to read additional data from server sessionid 0x1630
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO 
ConnectionStateManager: State change: SUSPENDED
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO 
ZooKeeperLeaderElectionAgent: We have lost leadership
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 ERROR Master: 
Leadership has been revoked – master shutting down.{code}


was (Author: agateaaa):
 
{code:java}
 Can we please increase the priority of this bug since it exists in latest 
Spark 2.3.0 too?  We have observed this during upgrade scenario (with Spark 
1.6.3), where we have to shutdown zookeeper, which has the adverse side-effect 
of spark master shutting down on other nodes which is not very ideal.

 Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: 
spark-box1:40588 got disassociated, removing it.
Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: 
spark-box1:7078 got disassociated, removing it.
Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Removing 
worker worker-20180427105900-spark-box1-7078 on 19
Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Telling 
app of lost executor: 2
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: 
Unable to read additional data from server sessionid 0x1630
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: 
Unable to read additional data from server sessionid 0x1630
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO 
ConnectionStateManager: State change: SUSPENDED
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO 
ZooKeeperLeaderElectionAgent: We have lost leadership
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 ERROR Master: 
Leadership has been revoked – master shutting down.{code}

> Bouncing Zookeeper node causes Active spark master to exit
> --
>
> Key: SPARK-15544
> URL: https://issues.apache.org/jira/browse/SPARK-15544
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
> Environment: Ubuntu 14.04.  Zookeeper 3.4.6 with 3-node quorum
>Reporter: Steven Lowenthal
>Priority: Major
>
> Shutting Down a single zookeeper node caused spark master to exit.  The 
> master should have connected to a second zookeeper node. 
> {code:title=log output}
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x154dfc0426b0054, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x254c701f28d0053, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost 
> leadership
> 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master 
> shutting down. }}
> {code}
> spark-env.sh: 
> {code:title=spark-env.sh}
> export SPARK_LOCAL_DIRS=/ephemeral/spark/local
> export 

[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit

2018-04-30 Thread Ashwin Agate (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1645#comment-1645
 ] 

Ashwin Agate edited comment on SPARK-15544 at 4/30/18 7:05 PM:
---

Can we please increase the priority of this bug since it exists in latest Spark 
2.3.0 too?  We have observed this during upgrade scenario (with Spark 1.6.3), 
where we have to shutdown zookeeper, which has the adverse side-effect of spark 
master shutting down on other nodes which is not very ideal.

 Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: 
spark-box1:40588 got disassociated, removing it.
Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: 
spark-box1:7078 got disassociated, removing it.
Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Removing 
worker worker-20180427105900-spark-box1-7078 on 19
Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Telling 
app of lost executor: 2
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: 
Unable to read additional data from server sessionid 0x1630
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: 
Unable to read additional data from server sessionid 0x1630
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO 
ConnectionStateManager: State change: SUSPENDED
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO 
ZooKeeperLeaderElectionAgent: We have lost leadership
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 ERROR Master: 
Leadership has been revoked -- master shutting down.
{code:java}
 {code}


was (Author: agateaaa):
Can we please increase the priority of this bug since it exists in latest Spark 
2.3.0 too?  We have observed this during upgrade scenario (with Spark 1.6.3), 
where we have to shutdown zookeeper, which has the adverse side-effect of spark 
master shutting down on other nodes which is not very ideal.

 

> Bouncing Zookeeper node causes Active spark master to exit
> --
>
> Key: SPARK-15544
> URL: https://issues.apache.org/jira/browse/SPARK-15544
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
> Environment: Ubuntu 14.04.  Zookeeper 3.4.6 with 3-node quorum
>Reporter: Steven Lowenthal
>Priority: Major
>
> Shutting Down a single zookeeper node caused spark master to exit.  The 
> master should have connected to a second zookeeper node. 
> {code:title=log output}
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x154dfc0426b0054, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x254c701f28d0053, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost 
> leadership
> 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master 
> shutting down. }}
> {code}
> spark-env.sh: 
> {code:title=spark-env.sh}
> export SPARK_LOCAL_DIRS=/ephemeral/spark/local
> export SPARK_WORKER_DIR=/ephemeral/spark/work
> export SPARK_LOG_DIR=/var/log/spark
> export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop
> export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER 
> -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181"
> export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true"
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit

2018-04-30 Thread agate (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1645#comment-1645
 ] 

agate commented on SPARK-15544:
---

Can we please increase the priority of this bug since it exists in latest Spark 
2.3.0 too?  We have observed this during upgrade scenario (with Spark 1.6.3), 
where we have to shutdown zookeeper, which has the adverse side-effect of spark 
master shutting down on other nodes which is not very ideal.

 

> Bouncing Zookeeper node causes Active spark master to exit
> --
>
> Key: SPARK-15544
> URL: https://issues.apache.org/jira/browse/SPARK-15544
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
> Environment: Ubuntu 14.04.  Zookeeper 3.4.6 with 3-node quorum
>Reporter: Steven Lowenthal
>Priority: Major
>
> Shutting Down a single zookeeper node caused spark master to exit.  The 
> master should have connected to a second zookeeper node. 
> {code:title=log output}
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x154dfc0426b0054, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x254c701f28d0053, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost 
> leadership
> 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master 
> shutting down. }}
> {code}
> spark-env.sh: 
> {code:title=spark-env.sh}
> export SPARK_LOCAL_DIRS=/ephemeral/spark/local
> export SPARK_WORKER_DIR=/ephemeral/spark/work
> export SPARK_LOG_DIR=/var/log/spark
> export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop
> export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER 
> -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181"
> export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true"
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24124) Spark history server should create spark.history.store.path and set permissions properly

2018-04-30 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16458835#comment-16458835
 ] 

Thomas Graves commented on SPARK-24124:
---

[~vanzin]  any objections to this?

> Spark history server should create spark.history.store.path and set 
> permissions properly
> 
>
> Key: SPARK-24124
> URL: https://issues.apache.org/jira/browse/SPARK-24124
> Project: Spark
>  Issue Type: Story
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Thomas Graves
>Priority: Major
>
> Current with the new spark history server you can set 
> spark.history.store.path to a location to store the levelDB files.  Currently 
> the directory has to be made before it can use that path.
> We should just have the history server create it and set the file permissions 
> on the leveldb files to be restrictive -> new FsPermission((short) 0700)
> the shuffle service already does this, this would be much more convenient to 
> use and prevent people from making mistakes with the permissions on the 
> directory and files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24124) Spark history server should create spark.history.store.path and set permissions properly

2018-04-30 Thread Thomas Graves (JIRA)
Thomas Graves created SPARK-24124:
-

 Summary: Spark history server should create 
spark.history.store.path and set permissions properly
 Key: SPARK-24124
 URL: https://issues.apache.org/jira/browse/SPARK-24124
 Project: Spark
  Issue Type: Story
  Components: Spark Core
Affects Versions: 2.3.0
Reporter: Thomas Graves


Current with the new spark history server you can set spark.history.store.path 
to a location to store the levelDB files.  Currently the directory has to be 
made before it can use that path.

We should just have the history server create it and set the file permissions 
on the leveldb files to be restrictive -> new FsPermission((short) 0700)

the shuffle service already does this, this would be much more convenient to 
use and prevent people from making mistakes with the permissions on the 
directory and files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24123) Fix a flaky test `DateTimeUtilsSuite.monthsBetween`

2018-04-30 Thread Dongjoon Hyun (JIRA)
Dongjoon Hyun created SPARK-24123:
-

 Summary: Fix a flaky test `DateTimeUtilsSuite.monthsBetween`
 Key: SPARK-24123
 URL: https://issues.apache.org/jira/browse/SPARK-24123
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.0
Reporter: Dongjoon Hyun


**MASTER BRANCH**
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.6/4810/testReport/org.apache.spark.sql.catalyst.util/DateTimeUtilsSuite/monthsBetween/

{code}
Error Message

3.949596773820191 did not equal 3.9495967741935485

Stacktrace

  org.scalatest.exceptions.TestFailedException: 3.949596773820191 did not 
equal 3.9495967741935485
  at 
org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:528)
  at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
  at 
org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:501)
  at 
org.apache.spark.sql.catalyst.util.DateTimeUtilsSuite$$anonfun$25.apply(DateTimeUtilsSuite.scala:495)
  at 
org.apache.spark.sql.catalyst.util.DateTimeUtilsSuite$$anonfun$25.apply(DateTimeUtilsSuite.scala:488)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23993) Support DESC FORMATTED table_name column_name

2018-04-30 Thread Sunitha Kambhampati (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16458822#comment-16458822
 ] 

Sunitha Kambhampati commented on SPARK-23993:
-

I just tried it out in the current trunk and the command is supported. 

I looked at the code and there is DescribeColumnCommand and it was implemented 
as part of  SPARK-17642[SQL] support DESC EXTENDED/FORMATTED table column 
commands.

> Support DESC FORMATTED table_name column_name
> -
>
> Key: SPARK-23993
> URL: https://issues.apache.org/jira/browse/SPARK-23993
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.2
>Reporter: Volodymyr Glushak
>Priority: Major
>
> Hive and Spark both supports:
> {code}
> DESC FORMATTED table_name{code}
> which gives table metadata.
> If you want to get metadata for particular column in hive you can execute:
> {code}
> DESC FORMATTED table_name column_name{code}
> Thos is not supported in Spark.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23975) Allow Clustering to take Arrays of Double as input features

2018-04-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16458778#comment-16458778
 ] 

Apache Spark commented on SPARK-23975:
--

User 'ludatabricks' has created a pull request for this issue:
https://github.com/apache/spark/pull/21195

> Allow Clustering to take Arrays of Double as input features
> ---
>
> Key: SPARK-23975
> URL: https://issues.apache.org/jira/browse/SPARK-23975
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.3.0
>Reporter: Lu Wang
>Assignee: Lu Wang
>Priority: Major
>
> Clustering algorithms should accept Arrays in addition to Vectors as input 
> features. The python interface should also be changed so that it would make 
> PySpark a lot easier to use. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-24072) clearly define pushed filters

2018-04-30 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-24072.
-
   Resolution: Fixed
Fix Version/s: 2.4.0

> clearly define pushed filters
> -
>
> Key: SPARK-24072
> URL: https://issues.apache.org/jira/browse/SPARK-24072
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 2.4.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24122) Allow automatic driver restarts on K8s

2018-04-30 Thread Oz Ben-Ami (JIRA)
Oz Ben-Ami created SPARK-24122:
--

 Summary: Allow automatic driver restarts on K8s
 Key: SPARK-24122
 URL: https://issues.apache.org/jira/browse/SPARK-24122
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 2.3.0
Reporter: Oz Ben-Ami


[~foxish]

Right now SparkSubmit creates the driver as a bare pod, rather than a managed 
controller like a Deployment or a StatefulSet. This means there is no way to 
guarantee automatic restarts, eg in case a node has an issue. Note Pod 
RestartPolicy does not apply if a node fails. A StatefulSet would allow us to 
guarantee that, and keep the ability for executors to find the driver using DNS.

This is particularly helpful for long-running streaming workloads, where we 
currently use {{yarn.resourcemanager.am.max-attempts}} with YARN. I can confirm 
that Spark Streaming and Structured Streaming applications can be made to 
recover from such a restart, with the help of checkpointing. The executors will 
have to be started again by the driver, but this should not be a problem.

For batch processing, we could alternatively use Kubernetes {{Job}} objects, 
which restart pods on failure but not success. For example, note the semantics 
provided by the {{kubectl run}} 
[command|https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#run]
 * {{--restart=Never}}: bare Pod
 * {{--restart=Always}}: Deployment
 * {{--restart=OnFailure}}: Job

https://github.com/apache-spark-on-k8s/spark/issues/288



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24046) Rate Source doesn't gradually increase rate when rampUpTime>=RowsPerSecond

2018-04-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16458636#comment-16458636
 ] 

Apache Spark commented on SPARK-24046:
--

User 'maasg' has created a pull request for this issue:
https://github.com/apache/spark/pull/21194

> Rate Source doesn't gradually increase rate when rampUpTime>=RowsPerSecond
> --
>
> Key: SPARK-24046
> URL: https://issues.apache.org/jira/browse/SPARK-24046
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.3.0
> Environment: Spark 2.3.0 using Spark Shell on Ubuntu 17.4
> (Environment is not important, the issue lies in the rate calculation)
>Reporter: Gerard Maas
>Priority: Major
>  Labels: RateSource
> Attachments: image-2018-04-22-22-03-03-945.png, 
> image-2018-04-22-22-06-49-202.png
>
>
> When using the rate source in Structured streaming, the `rampUpTime` feature 
> fails to gradually increase the stream rate when the `rampUpTime` option is 
> equal or greater than `rowsPerSecond`. 
> When rampUpTime >= rowsPerSecond` all batches at `time < rampUpTime` contain 
> 0 values. The rate jumps to  `rowsPerSecond` when `time>rampUpTime`.
> The following scenario, executed in the `spark-shell` demonstrates this issue:
> {code:java}
> // Using rampUpTime(10) > rowsPerSecond(5)  
> {code}
> {code:java}
> val stream = spark.readStream
> .format("rate")
> .option("rowsPerSecond", 5)
> .option("rampUpTime", 10)
> .load()
> val query = stream.writeStream.format("console").start()
> // Exiting paste mode, now interpreting.
> stream: org.apache.spark.sql.DataFrame = [timestamp: timestamp, value: bigint]
> query: org.apache.spark.sql.streaming.StreamingQuery = 
> org.apache.spark.sql.execution.streaming.StreamingQueryWrapper@cf82c58
> ---
> Batch: 0
> ---
> +-+-+
> |timestamp|value|
> +-+-+
> +-+-+
> ---
> Batch: 1
> ---
> +-+-+
> |timestamp|value|
> +-+-+
> +-+-+
> ---
> Batch: 2
> ---
> +-+-+
> |timestamp|value|
> +-+-+
> +-+-+
> ---
> Batch: 3
> ---
> +-+-+
> |timestamp|value|
> +-+-+
> +-+-+
> ---
> Batch: 4
> ---
> +-+-+
> |timestamp|value|
> +-+-+
> +-+-+
> ---
> Batch: 5
> ---
> +-+-+
> |timestamp|value|
> +-+-+
> +-+-+
> ---
> Batch: 6
> ---
> +-+-+
> |timestamp|value|
> +-+-+
> +-+-+
> ---
> Batch: 7
> ---
> +-+-+
> |timestamp|value|
> +-+-+
> +-+-+
> ---
> Batch: 8
> ---
> +-+-+
> |timestamp|value|
> +-+-+
> +-+-+
> ---
> Batch: 9
> ---
> +-+-+
> |timestamp|value|
> +-+-+
> +-+-+
> ---
> Batch: 10
> ---
> +-+-+
> |timestamp|value|
> +-+-+
> +-+-+
> ---
> Batch: 11
> ---
> ++-+
> | timestamp|value|
> ++-+
> |2018-04-22 17:08:...| 0|
> |2018-04-22 17:08:...| 1|
> |2018-04-22 17:08:...| 2|
> |2018-04-22 17:08:...| 3|
> |2018-04-22 17:08:...| 4|
> ++-+
> ---
> Batch: 12
> ---
> ++-+
> | timestamp|value|
> ++-+
> |2018-04-22 17:08:...| 5|
> |2018-04-22 17:08:...| 6|
> |2018-04-22 17:08:...| 7|
> |2018-04-22 17:08:...| 8|
> |2018-04-22 17:08:...| 9|
> ++-+
> {code}
>  
> This scenario shows rowsPerSecond == rampUpTime,  which also fails
> {code:java}
> val stream = spark.readStream
> .format("rate")
> .option("rowsPerSecond", 10)
> .option("rampUpTime", 10)
> .load()
> val query = 

[jira] [Assigned] (SPARK-24121) The API for handling expression code generation in expression codegen

2018-04-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24121:


Assignee: Apache Spark

> The API for handling expression code generation in expression codegen
> -
>
> Key: SPARK-24121
> URL: https://issues.apache.org/jira/browse/SPARK-24121
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Liang-Chi Hsieh
>Assignee: Apache Spark
>Priority: Major
>
> In order to achieve the replacement of expr value during expression codegen 
> (please see the proposal at 
> [https://github.com/apache/spark/pull/19813#issuecomment-354045400),] we need 
> an API to handle the insertion of temporary symbols for statements generated 
> by expressions. This API must allow us to know what statement expressions are 
> during codegen and to use symbols instead of actual codes when generating 
> java codes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24121) The API for handling expression code generation in expression codegen

2018-04-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24121:


Assignee: (was: Apache Spark)

> The API for handling expression code generation in expression codegen
> -
>
> Key: SPARK-24121
> URL: https://issues.apache.org/jira/browse/SPARK-24121
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Liang-Chi Hsieh
>Priority: Major
>
> In order to achieve the replacement of expr value during expression codegen 
> (please see the proposal at 
> [https://github.com/apache/spark/pull/19813#issuecomment-354045400),] we need 
> an API to handle the insertion of temporary symbols for statements generated 
> by expressions. This API must allow us to know what statement expressions are 
> during codegen and to use symbols instead of actual codes when generating 
> java codes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24121) The API for handling expression code generation in expression codegen

2018-04-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16458574#comment-16458574
 ] 

Apache Spark commented on SPARK-24121:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/21193

> The API for handling expression code generation in expression codegen
> -
>
> Key: SPARK-24121
> URL: https://issues.apache.org/jira/browse/SPARK-24121
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Liang-Chi Hsieh
>Priority: Major
>
> In order to achieve the replacement of expr value during expression codegen 
> (please see the proposal at 
> [https://github.com/apache/spark/pull/19813#issuecomment-354045400),] we need 
> an API to handle the insertion of temporary symbols for statements generated 
> by expressions. This API must allow us to know what statement expressions are 
> during codegen and to use symbols instead of actual codes when generating 
> java codes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24121) The API for handling expression code generation in expression codegen

2018-04-30 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-24121:
---

 Summary: The API for handling expression code generation in 
expression codegen
 Key: SPARK-24121
 URL: https://issues.apache.org/jira/browse/SPARK-24121
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Liang-Chi Hsieh


In order to achieve the replacement of expr value during expression codegen 
(please see the proposal at 
[https://github.com/apache/spark/pull/19813#issuecomment-354045400),] we need 
an API to handle the insertion of temporary symbols for statements generated by 
expressions. This API must allow us to know what statement expressions are 
during codegen and to use symbols instead of actual codes when generating java 
codes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23549) Spark SQL unexpected behavior when comparing timestamp to date

2018-04-30 Thread Emlyn Corrin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16458511#comment-16458511
 ] 

Emlyn Corrin commented on SPARK-23549:
--

Will this be included in Spark 2.3.1? It only says 2.4.0

> Spark SQL unexpected behavior when comparing timestamp to date
> --
>
> Key: SPARK-23549
> URL: https://issues.apache.org/jira/browse/SPARK-23549
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.3, 2.0.2, 2.1.2, 2.2.1, 2.3.0
>Reporter: Dong Jiang
>Assignee: Kazuaki Ishizaki
>Priority: Major
> Fix For: 2.4.0
>
>
> {code:java}
> scala> spark.version
> res1: String = 2.2.1
> scala> spark.sql("select cast('2017-03-01 00:00:00' as timestamp) between 
> cast('2017-02-28' as date) and cast('2017-03-01' as date)").show
> +---+
> |((CAST(CAST(2017-03-01 00:00:00 AS TIMESTAMP) AS STRING) >= 
> CAST(CAST(2017-02-28 AS DATE) AS STRING)) AND (CAST(CAST(2017-03-01 00:00:00 
> AS TIMESTAMP) AS STRING) <= CAST(CAST(2017-03-01 AS DATE) AS STRING)))|
> +---+
> |                                                                             
>                                                                               
>                                                false|
> +---+{code}
> As shown above, when a timestamp is compared to date in SparkSQL, both 
> timestamp and date are downcast to string, and leading to unexpected result. 
> If run the same SQL in presto/Athena, I got the expected result
> {code:java}
> select cast('2017-03-01 00:00:00' as timestamp) between cast('2017-02-28' as 
> date) and cast('2017-03-01' as date)
>   _col0
> 1 true
> {code}
> Is this a bug for Spark or a feature?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23840) PySpark error when converting a DataFrame to rdd

2018-04-30 Thread Hyukjin Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-23840.
--
Resolution: Invalid

I am leaving this resolved. Sounds there's no way to go further without more 
information.

> PySpark error when converting a DataFrame to rdd
> 
>
> Key: SPARK-23840
> URL: https://issues.apache.org/jira/browse/SPARK-23840
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.0
>Reporter: Uri Goren
>Priority: Major
>
> I am running code in the `pyspark` shell on an `emr` cluster, and 
> encountering an error I have never seen before...
> This line works:
> spark.read.parquet(s3_input).take(99)
> While this line causes an exception:
> spark.read.parquet(s3_input).rdd.take(99)
> With
> > TypeError: 'int' object is not iterable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org