[jira] [Assigned] (SPARK-29490) Reset 'WritableColumnVector' in 'RowToColumnarExec'

2019-10-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-29490:
-

Assignee: Rong Ma

> Reset 'WritableColumnVector' in 'RowToColumnarExec'
> ---
>
> Key: SPARK-29490
> URL: https://issues.apache.org/jira/browse/SPARK-29490
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Rong Ma
>Assignee: Rong Ma
>Priority: Major
>
> When converting {{Iterator[InternalRow]}} to {{Iterator[ColumnarBatch]}}, the 
> vectors used to create a new {{ColumnarBatch}} should be reset in the 
> iterator's "next()" method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29490) Reset 'WritableColumnVector' in 'RowToColumnarExec'

2019-10-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-29490.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26137
[https://github.com/apache/spark/pull/26137]

> Reset 'WritableColumnVector' in 'RowToColumnarExec'
> ---
>
> Key: SPARK-29490
> URL: https://issues.apache.org/jira/browse/SPARK-29490
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Rong Ma
>Assignee: Rong Ma
>Priority: Major
> Fix For: 3.0.0
>
>
> When converting {{Iterator[InternalRow]}} to {{Iterator[ColumnarBatch]}}, the 
> vectors used to create a new {{ColumnarBatch}} should be reset in the 
> iterator's "next()" method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29554) Add `version` SQL function

2019-10-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29554:
--
Description: 
|string|version()|Returns the Hive version (as of Hive 2.1.0). The string 
contains 2 fields, the first being a build number and the second being a build 
hash. Example: "select version();" might return "2.1.0 
r027527b9c5ce1a3d7d0b6d2e6de2378fb0c39232". Actual results will depend on your 
build.|

 

 

[https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF]

  was:
|string|version()|Returns the Hive version (as of Hive 2.1.0). The string 
contains 2 fields, the first being a build number and the second being a build 
hash. Example: "select version();" might return "2.1.0.2.5.0.0-1245 
r027527b9c5ce1a3d7d0b6d2e6de2378fb0c39232". Actual results will depend on your 
build.|

 

 

[https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF]


> Add `version` SQL function
> --
>
> Key: SPARK-29554
> URL: https://issues.apache.org/jira/browse/SPARK-29554
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Minor
> Fix For: 3.0.0
>
>
> |string|version()|Returns the Hive version (as of Hive 2.1.0). The string 
> contains 2 fields, the first being a build number and the second being a 
> build hash. Example: "select version();" might return "2.1.0 
> r027527b9c5ce1a3d7d0b6d2e6de2378fb0c39232". Actual results will depend on 
> your build.|
>  
>  
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29554) Add a misc function named version

2019-10-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-29554:
-

Assignee: Kent Yao

> Add a misc function named version
> -
>
> Key: SPARK-29554
> URL: https://issues.apache.org/jira/browse/SPARK-29554
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Minor
>
> |string|version()|Returns the Hive version (as of Hive 2.1.0). The string 
> contains 2 fields, the first being a build number and the second being a 
> build hash. Example: "select version();" might return "2.1.0.2.5.0.0-1245 
> r027527b9c5ce1a3d7d0b6d2e6de2378fb0c39232". Actual results will depend on 
> your build.|
>  
>  
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29554) Add a misc function named version

2019-10-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-29554.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26209
[https://github.com/apache/spark/pull/26209]

> Add a misc function named version
> -
>
> Key: SPARK-29554
> URL: https://issues.apache.org/jira/browse/SPARK-29554
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Minor
> Fix For: 3.0.0
>
>
> |string|version()|Returns the Hive version (as of Hive 2.1.0). The string 
> contains 2 fields, the first being a build number and the second being a 
> build hash. Example: "select version();" might return "2.1.0.2.5.0.0-1245 
> r027527b9c5ce1a3d7d0b6d2e6de2378fb0c39232". Actual results will depend on 
> your build.|
>  
>  
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29554) Add `version` SQL function

2019-10-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29554:
--
Summary: Add `version` SQL function  (was: Add a misc function named 
version)

> Add `version` SQL function
> --
>
> Key: SPARK-29554
> URL: https://issues.apache.org/jira/browse/SPARK-29554
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Minor
> Fix For: 3.0.0
>
>
> |string|version()|Returns the Hive version (as of Hive 2.1.0). The string 
> contains 2 fields, the first being a build number and the second being a 
> build hash. Example: "select version();" might return "2.1.0.2.5.0.0-1245 
> r027527b9c5ce1a3d7d0b6d2e6de2378fb0c39232". Actual results will depend on 
> your build.|
>  
>  
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29610) Keys with Null values are discarded when using to_json function

2019-10-25 Thread Jonathan (Jira)
Jonathan created SPARK-29610:


 Summary: Keys with Null values are discarded when using to_json 
function
 Key: SPARK-29610
 URL: https://issues.apache.org/jira/browse/SPARK-29610
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 2.4.4
Reporter: Jonathan


When calling to_json on a Struct if a key has Null as a value then the key is 
thrown away.
{code:java}
import pyspark
import pyspark.sql.functions as F
l = [("a", "foo"), ("b", None)]
df = spark.createDataFrame(l, ["id", "data"]) 
(
  df.select(F.struct("*").alias("payload"))
.withColumn("payload", 
  F.to_json(F.col("payload"))
).select("payload")
.show()
){code}
Produces the following output:
{noformat}
++
| payload|
++
|{"id":"a","data":...|
|  {"id":"b"}|
++{noformat}
The `data` key in the second row has just been silently deleted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29033) Always use CreateNamedStructUnsafe, the UnsafeRow-based version of the CreateNamedStruct codepath

2019-10-25 Thread Josh Rosen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-29033.

Resolution: Won't Fix

I'm resolving this as "Won't Fix" because we ended up doing the opposite in 
SPARK-29503, removing CreateNamedStructUnsafe because its sole use was 
associated with a correctness bug.

> Always use CreateNamedStructUnsafe, the UnsafeRow-based version of the 
> CreateNamedStruct codepath
> -
>
> Key: SPARK-29033
> URL: https://issues.apache.org/jira/browse/SPARK-29033
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Major
>
> Spark 2.x has two separate implementations of the "create named struct" 
> expression: regular {{CreateNamedStruct}} and {{CreateNamedStructUnsafe}}. 
> The "unsafe" version was added in SPARK-9373 to support structs in 
> {{GenerateUnsafeProjection}}. These two expressions both extend the 
> {{CreateNameStructLike}} trait.
> For Spark 3.0, I propose to always use the "unsafe" code path: this will 
> avoid object allocation / boxing inefficiencies in the "safe" codepath, which 
> is an especially big problem when generating Encoders for deeply-nested case 
> classes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-27736) Improve handling of FetchFailures caused by ExternalShuffleService losing track of executor registrations

2019-10-25 Thread feiwang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-27736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16960244#comment-16960244
 ] 

feiwang edited comment on SPARK-27736 at 10/26/19 2:30 AM:
---

Hi, we met this issue recently.
[~joshrosen] [~tgraves]
How about implementing a simple solution:
* Let externalShuffleClient can query whether a executor is registered in ESS
* when FetchFailedException thrown, check whether this executor is registered 
in ESS
* if not, we should remove all outputs of executors that are not registered on 
this host.

If it is Ok, I can implement it.


was (Author: hzfeiwang):
Hi, we met this issue recently.
[~joshrosen] [~tgraves]
How about implementing a simple solution:
* Let externalShuffleClient can query whether a executor is registered in ESS
* when remove executor, check whether this executor is registered in ESS
* if not, we should remove all outputs of executors that are not registered on 
this host.

If it is Ok, I can implement it.

> Improve handling of FetchFailures caused by ExternalShuffleService losing 
> track of executor registrations
> -
>
> Key: SPARK-27736
> URL: https://issues.apache.org/jira/browse/SPARK-27736
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 2.4.0
>Reporter: Josh Rosen
>Priority: Minor
>
> This ticket describes a fault-tolerance edge-case which can cause Spark jobs 
> to fail if a single external shuffle service process reboots and fails to 
> recover the list of registered executors (something which can happen when 
> using YARN if NodeManager recovery is disabled) _and_ the Spark job has a 
> large number of executors per host.
> I believe this problem can be worked around today via a change of 
> configurations, but I'm filing this issue to (a) better document this 
> problem, and (b) propose either a change of default configurations or 
> additional DAGScheduler logic to better handle this failure mode.
> h2. Problem description
> The external shuffle service process is _mostly_ stateless except for a map 
> tracking the set of registered applications and executors.
> When processing a shuffle fetch request, the shuffle services first checks 
> whether the requested block ID's executor is registered; if it's not 
> registered then the shuffle service throws an exception like 
> {code:java}
> java.lang.RuntimeException: Executor is not registered 
> (appId=application_1557557221330_6891, execId=428){code}
> and this exception becomes a {{FetchFailed}} error in the executor requesting 
> the shuffle block.
> In normal operation this error should not occur because executors shouldn't 
> be mis-routing shuffle fetch requests. However, this _can_ happen if the 
> shuffle service crashes and restarts, causing it to lose its in-memory 
> executor registration state. With YARN this state can be recovered from disk 
> if YARN NodeManager recovery is enabled (using the mechanism added in 
> SPARK-9439), but I don't believe that we perform state recovery in Standalone 
> and Mesos modes (see SPARK-24223).
> If state cannot be recovered then map outputs cannot be served (even though 
> the files probably still exist on disk). In theory, this shouldn't cause 
> Spark jobs to fail because we can always redundantly recompute lost / 
> unfetchable map outputs.
> However, in practice this can cause total job failures in deployments where 
> the node with the failed shuffle service was running a large number of 
> executors: by default, the DAGScheduler unregisters map outputs _only from 
> individual executor whose shuffle blocks could not be fetched_ (see 
> [code|https://github.com/apache/spark/blame/bfb3ffe9b33a403a1f3b6f5407d34a477ce62c85/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1643]),
>  so it can take several rounds of failed stage attempts to fail and clear 
> output from all executors on the faulty host. If the number of executors on a 
> host is greater than the stage retry limit then this can exhaust stage retry 
> attempts and cause job failures.
> This "multiple rounds of recomputation to discover all failed executors on a 
> host" problem was addressed by SPARK-19753, which added a 
> {{spark.files.fetchFailure.unRegisterOutputOnHost}} configuration which 
> promotes executor fetch failures into host-wide fetch failures (clearing 
> output from all neighboring executors upon a single failure). However, that 
> configuration is {{false}} by default.
> h2. Potential solutions
> I have a few ideas about how we can improve this situation:
>  - Update the [YARN external shuffle service 
> documentation|https://spark.apache.org/docs/latest/running-on-yarn.html#configuring-the-external-shuffle-service]
> 

[jira] [Commented] (SPARK-27736) Improve handling of FetchFailures caused by ExternalShuffleService losing track of executor registrations

2019-10-25 Thread feiwang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-27736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16960244#comment-16960244
 ] 

feiwang commented on SPARK-27736:
-

Hi, we met this issue recently.
[~joshrosen] [~tgraves]
How about implementing a simple solution:
* Let externalShuffleClient can query whether a executor is registered in ESS
* when remove executor, check whether this executor is registered in ESS
* if not, we should remove all outputs of executors that are not registered on 
this host.

If it is Ok, I can implement it.

> Improve handling of FetchFailures caused by ExternalShuffleService losing 
> track of executor registrations
> -
>
> Key: SPARK-27736
> URL: https://issues.apache.org/jira/browse/SPARK-27736
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 2.4.0
>Reporter: Josh Rosen
>Priority: Minor
>
> This ticket describes a fault-tolerance edge-case which can cause Spark jobs 
> to fail if a single external shuffle service process reboots and fails to 
> recover the list of registered executors (something which can happen when 
> using YARN if NodeManager recovery is disabled) _and_ the Spark job has a 
> large number of executors per host.
> I believe this problem can be worked around today via a change of 
> configurations, but I'm filing this issue to (a) better document this 
> problem, and (b) propose either a change of default configurations or 
> additional DAGScheduler logic to better handle this failure mode.
> h2. Problem description
> The external shuffle service process is _mostly_ stateless except for a map 
> tracking the set of registered applications and executors.
> When processing a shuffle fetch request, the shuffle services first checks 
> whether the requested block ID's executor is registered; if it's not 
> registered then the shuffle service throws an exception like 
> {code:java}
> java.lang.RuntimeException: Executor is not registered 
> (appId=application_1557557221330_6891, execId=428){code}
> and this exception becomes a {{FetchFailed}} error in the executor requesting 
> the shuffle block.
> In normal operation this error should not occur because executors shouldn't 
> be mis-routing shuffle fetch requests. However, this _can_ happen if the 
> shuffle service crashes and restarts, causing it to lose its in-memory 
> executor registration state. With YARN this state can be recovered from disk 
> if YARN NodeManager recovery is enabled (using the mechanism added in 
> SPARK-9439), but I don't believe that we perform state recovery in Standalone 
> and Mesos modes (see SPARK-24223).
> If state cannot be recovered then map outputs cannot be served (even though 
> the files probably still exist on disk). In theory, this shouldn't cause 
> Spark jobs to fail because we can always redundantly recompute lost / 
> unfetchable map outputs.
> However, in practice this can cause total job failures in deployments where 
> the node with the failed shuffle service was running a large number of 
> executors: by default, the DAGScheduler unregisters map outputs _only from 
> individual executor whose shuffle blocks could not be fetched_ (see 
> [code|https://github.com/apache/spark/blame/bfb3ffe9b33a403a1f3b6f5407d34a477ce62c85/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1643]),
>  so it can take several rounds of failed stage attempts to fail and clear 
> output from all executors on the faulty host. If the number of executors on a 
> host is greater than the stage retry limit then this can exhaust stage retry 
> attempts and cause job failures.
> This "multiple rounds of recomputation to discover all failed executors on a 
> host" problem was addressed by SPARK-19753, which added a 
> {{spark.files.fetchFailure.unRegisterOutputOnHost}} configuration which 
> promotes executor fetch failures into host-wide fetch failures (clearing 
> output from all neighboring executors upon a single failure). However, that 
> configuration is {{false}} by default.
> h2. Potential solutions
> I have a few ideas about how we can improve this situation:
>  - Update the [YARN external shuffle service 
> documentation|https://spark.apache.org/docs/latest/running-on-yarn.html#configuring-the-external-shuffle-service]
>  to recommend enabling node manager recovery.
>  - Consider defaulting {{spark.files.fetchFailure.unRegisterOutputOnHost}} to 
> {{true}}. This would improve out-of-the-box resiliency for large clusters. 
> The trade-off here is a reduction of efficiency in case there are transient 
> "false positive" fetch failures, but I suspect this case may be unlikely in 
> practice (so the change of default could be an acceptable trade-off). See 
> [prior discussion on 
>

[jira] [Updated] (SPARK-29608) Add Hadoop 3.2 profile to binary package

2019-10-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29608:
--
Issue Type: Improvement  (was: Task)

> Add Hadoop 3.2 profile to binary package
> 
>
> Key: SPARK-29608
> URL: https://issues.apache.org/jira/browse/SPARK-29608
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29608) Add Hadoop 3.2 profile to binary package

2019-10-25 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-29608:
-

 Summary: Add Hadoop 3.2 profile to binary package
 Key: SPARK-29608
 URL: https://issues.apache.org/jira/browse/SPARK-29608
 Project: Spark
  Issue Type: Task
  Components: Build
Affects Versions: 3.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29608) Add Hadoop 3.2 profile to binary package

2019-10-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-29608.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26260
[https://github.com/apache/spark/pull/26260]

> Add Hadoop 3.2 profile to binary package
> 
>
> Key: SPARK-29608
> URL: https://issues.apache.org/jira/browse/SPARK-29608
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25906) spark-shell cannot handle `-i` option correctly

2019-10-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-25906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25906:
--
Fix Version/s: (was: 3.0.0)

> spark-shell cannot handle `-i` option correctly
> ---
>
> Key: SPARK-25906
> URL: https://issues.apache.org/jira/browse/SPARK-25906
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, SQL
>Affects Versions: 2.4.0
>Reporter: Dongjoon Hyun
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 2.4.1
>
>
> This is a regression on Spark 2.4.0.
> *Spark 2.3.2*
> {code:java}
> $ cat test.scala
> spark.version
> case class Record(key: Int, value: String)
> spark.sparkContext.parallelize((1 to 2).map(i => Record(i, 
> s"val_$i"))).toDF.show
> $ bin/spark-shell -i test.scala
> 18/10/31 23:22:43 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> Spark context Web UI available at http://localhost:4040
> Spark context available as 'sc' (master = local[*], app id = 
> local-1541053368478).
> Spark session available as 'spark'.
> Loading test.scala...
> res0: String = 2.3.2
> defined class Record
> 18/10/31 23:22:56 WARN ObjectStore: Failed to get database global_temp, 
> returning NoSuchObjectException
> +---+-+
> |key|value|
> +---+-+
> |  1|val_1|
> |  2|val_2|
> +---+-+
> {code}
> *Spark 2.4.0 RC5*
> {code:java}
> $ bin/spark-shell -i test.scala
> 2018-10-31 23:23:14 WARN  NativeCodeLoader:62 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> Spark context Web UI available at http://localhost:4040
> Spark context available as 'sc' (master = local[*], app id = 
> local-1541053400312).
> Spark session available as 'spark'.
> test.scala:17: error: value toDF is not a member of 
> org.apache.spark.rdd.RDD[Record]
> Error occurred in an application involving default arguments.
>spark.sparkContext.parallelize((1 to 2).map(i => Record(i, 
> s"val_$i"))).toDF.show
> {code}
> *WORKAROUND*
> Add the following line at the first of the script.
> {code}
> import spark.implicits._
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29607) Move static methods from CalendarInterval to IntervalUtils

2019-10-25 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29607:
--

 Summary: Move static methods from CalendarInterval to IntervalUtils
 Key: SPARK-29607
 URL: https://issues.apache.org/jira/browse/SPARK-29607
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.4
Reporter: Maxim Gekk


Move static methods from the CalendarInterval class to the helper object 
IntervalUtils. Need to rewrite Java code to Scala code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29608) Add Hadoop 3.2 profile to binary package

2019-10-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-29608:
-

Assignee: Dongjoon Hyun

> Add Hadoop 3.2 profile to binary package
> 
>
> Key: SPARK-29608
> URL: https://issues.apache.org/jira/browse/SPARK-29608
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29609) DataSourceV2: Support DROP NAMESPACE

2019-10-25 Thread Terry Kim (Jira)
Terry Kim created SPARK-29609:
-

 Summary: DataSourceV2: Support DROP NAMESPACE
 Key: SPARK-29609
 URL: https://issues.apache.org/jira/browse/SPARK-29609
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Terry Kim


DROP NAMESPACE needs to support v2 catalogs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29580) Add kerberos debug messages for Kafka secure tests

2019-10-25 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16960077#comment-16960077
 ] 

Dongjoon Hyun commented on SPARK-29580:
---

We will open a new JIRA if this happens again. For now, the PR is the best 
effort we can do.

> Add kerberos debug messages for Kafka secure tests
> --
>
> Key: SPARK-29580
> URL: https://issues.apache.org/jira/browse/SPARK-29580
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Gabor Somogyi
>Priority: Major
> Fix For: 3.0.0
>
>
> - 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112562/testReport/org.apache.spark.sql.kafka010/KafkaDelegationTokenSuite/_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_/
> {code}
> sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: Failed to 
> create new KafkaAdminClient
>   at 
> org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:407)
>   at 
> org.apache.kafka.clients.admin.AdminClient.create(AdminClient.java:55)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:227)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:249)
>   at 
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49)
>   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
>   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
>   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:56)
>   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317)
>   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: 
> javax.security.auth.login.LoginException: Server not found in Kerberos 
> database (7) - Server not found in Kerberos database
>   at 
> org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:160)
>   at 
> org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:146)
>   at 
> org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:67)
>   at 
> org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:99)
>   at 
> org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:382)
>   ... 16 more
> Caused by: sbt.ForkMain$ForkError: javax.security.auth.login.LoginException: 
> Server not found in Kerberos database (7) - Server not found in Kerberos 
> database
>   at 
> com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:804)
>   at 
> com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
>   at 
> javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
>   at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
>   at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at 
> javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
>   at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
>   at 
> org.apache.kafka.common.security.authenticator.AbstractLogin.login(AbstractLogin.java:60)
>   at 
> org.apache.kafka.common.security.kerberos.KerberosLogin.login(KerberosLogin.java:103)
>   at 
> org.apache.kafka.common.security.authenticator.LoginManager.(LoginManager.java:61)
>   at 
> org.apache.kafka.common.security.authenticator.LoginManager.acquireLoginManager(LoginManager.java:104)
>   at 
> org.apache.kafka.common.net

[jira] [Updated] (SPARK-29580) Add kerberos debug messages for Kafka secure tests

2019-10-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29580:
--
Summary: Add kerberos debug messages for Kafka secure tests  (was: 
KafkaDelegationTokenSuite fails to create new KafkaAdminClient)

> Add kerberos debug messages for Kafka secure tests
> --
>
> Key: SPARK-29580
> URL: https://issues.apache.org/jira/browse/SPARK-29580
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Gabor Somogyi
>Priority: Major
> Fix For: 3.0.0
>
>
> - 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112562/testReport/org.apache.spark.sql.kafka010/KafkaDelegationTokenSuite/_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_/
> {code}
> sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: Failed to 
> create new KafkaAdminClient
>   at 
> org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:407)
>   at 
> org.apache.kafka.clients.admin.AdminClient.create(AdminClient.java:55)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:227)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:249)
>   at 
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49)
>   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
>   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
>   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:56)
>   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317)
>   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: 
> javax.security.auth.login.LoginException: Server not found in Kerberos 
> database (7) - Server not found in Kerberos database
>   at 
> org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:160)
>   at 
> org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:146)
>   at 
> org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:67)
>   at 
> org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:99)
>   at 
> org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:382)
>   ... 16 more
> Caused by: sbt.ForkMain$ForkError: javax.security.auth.login.LoginException: 
> Server not found in Kerberos database (7) - Server not found in Kerberos 
> database
>   at 
> com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:804)
>   at 
> com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
>   at 
> javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
>   at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
>   at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at 
> javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
>   at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
>   at 
> org.apache.kafka.common.security.authenticator.AbstractLogin.login(AbstractLogin.java:60)
>   at 
> org.apache.kafka.common.security.kerberos.KerberosLogin.login(KerberosLogin.java:103)
>   at 
> org.apache.kafka.common.security.authenticator.LoginManager.(LoginManager.java:61)
>   at 
> org.apache.kafka.common.security.authenticator.LoginManager.acquireLoginManager(LoginManager.java:104)
>   at 
> org.apache.kafka.common.network.SaslChannel

[jira] [Resolved] (SPARK-29580) KafkaDelegationTokenSuite fails to create new KafkaAdminClient

2019-10-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-29580.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26252
[https://github.com/apache/spark/pull/26252]

> KafkaDelegationTokenSuite fails to create new KafkaAdminClient
> --
>
> Key: SPARK-29580
> URL: https://issues.apache.org/jira/browse/SPARK-29580
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Gabor Somogyi
>Priority: Major
> Fix For: 3.0.0
>
>
> - 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112562/testReport/org.apache.spark.sql.kafka010/KafkaDelegationTokenSuite/_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_/
> {code}
> sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: Failed to 
> create new KafkaAdminClient
>   at 
> org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:407)
>   at 
> org.apache.kafka.clients.admin.AdminClient.create(AdminClient.java:55)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:227)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:249)
>   at 
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49)
>   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
>   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
>   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:56)
>   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317)
>   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: 
> javax.security.auth.login.LoginException: Server not found in Kerberos 
> database (7) - Server not found in Kerberos database
>   at 
> org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:160)
>   at 
> org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:146)
>   at 
> org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:67)
>   at 
> org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:99)
>   at 
> org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:382)
>   ... 16 more
> Caused by: sbt.ForkMain$ForkError: javax.security.auth.login.LoginException: 
> Server not found in Kerberos database (7) - Server not found in Kerberos 
> database
>   at 
> com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:804)
>   at 
> com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
>   at 
> javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
>   at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
>   at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at 
> javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
>   at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
>   at 
> org.apache.kafka.common.security.authenticator.AbstractLogin.login(AbstractLogin.java:60)
>   at 
> org.apache.kafka.common.security.kerberos.KerberosLogin.login(KerberosLogin.java:103)
>   at 
> org.apache.kafka.common.security.authenticator.LoginManager.(LoginManager.java:61)
>   at 
> org.apache.kafka.common.security.authenticator.LoginManager.acquireLoginManager(LoginManager.java:104)
>   at 
> org.apache.kafka.com

[jira] [Assigned] (SPARK-29580) KafkaDelegationTokenSuite fails to create new KafkaAdminClient

2019-10-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-29580:
-

Assignee: Gabor Somogyi

> KafkaDelegationTokenSuite fails to create new KafkaAdminClient
> --
>
> Key: SPARK-29580
> URL: https://issues.apache.org/jira/browse/SPARK-29580
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Gabor Somogyi
>Priority: Major
>
> - 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112562/testReport/org.apache.spark.sql.kafka010/KafkaDelegationTokenSuite/_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_/
> {code}
> sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: Failed to 
> create new KafkaAdminClient
>   at 
> org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:407)
>   at 
> org.apache.kafka.clients.admin.AdminClient.create(AdminClient.java:55)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:227)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:249)
>   at 
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49)
>   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
>   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
>   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:56)
>   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317)
>   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: 
> javax.security.auth.login.LoginException: Server not found in Kerberos 
> database (7) - Server not found in Kerberos database
>   at 
> org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:160)
>   at 
> org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:146)
>   at 
> org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:67)
>   at 
> org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:99)
>   at 
> org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:382)
>   ... 16 more
> Caused by: sbt.ForkMain$ForkError: javax.security.auth.login.LoginException: 
> Server not found in Kerberos database (7) - Server not found in Kerberos 
> database
>   at 
> com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:804)
>   at 
> com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
>   at 
> javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
>   at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
>   at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at 
> javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
>   at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
>   at 
> org.apache.kafka.common.security.authenticator.AbstractLogin.login(AbstractLogin.java:60)
>   at 
> org.apache.kafka.common.security.kerberos.KerberosLogin.login(KerberosLogin.java:103)
>   at 
> org.apache.kafka.common.security.authenticator.LoginManager.(LoginManager.java:61)
>   at 
> org.apache.kafka.common.security.authenticator.LoginManager.acquireLoginManager(LoginManager.java:104)
>   at 
> org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:149)
>   ... 20 more
> Caused by: sbt.ForkMain$ForkError: sun.

[jira] [Commented] (SPARK-29580) KafkaDelegationTokenSuite fails to create new KafkaAdminClient

2019-10-25 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16959962#comment-16959962
 ] 

Dongjoon Hyun commented on SPARK-29580:
---

Thank you for taking a look, [~gsomogyi]. +1 for waiting for the next failure. 
If that doesn't happen frequently, it's okay to leave this AS-IS status. Your 
time is precious.

> KafkaDelegationTokenSuite fails to create new KafkaAdminClient
> --
>
> Key: SPARK-29580
> URL: https://issues.apache.org/jira/browse/SPARK-29580
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> - 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112562/testReport/org.apache.spark.sql.kafka010/KafkaDelegationTokenSuite/_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_/
> {code}
> sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: Failed to 
> create new KafkaAdminClient
>   at 
> org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:407)
>   at 
> org.apache.kafka.clients.admin.AdminClient.create(AdminClient.java:55)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:227)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:249)
>   at 
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49)
>   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
>   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
>   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:56)
>   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317)
>   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: 
> javax.security.auth.login.LoginException: Server not found in Kerberos 
> database (7) - Server not found in Kerberos database
>   at 
> org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:160)
>   at 
> org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:146)
>   at 
> org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:67)
>   at 
> org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:99)
>   at 
> org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:382)
>   ... 16 more
> Caused by: sbt.ForkMain$ForkError: javax.security.auth.login.LoginException: 
> Server not found in Kerberos database (7) - Server not found in Kerberos 
> database
>   at 
> com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:804)
>   at 
> com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
>   at 
> javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
>   at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
>   at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at 
> javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
>   at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
>   at 
> org.apache.kafka.common.security.authenticator.AbstractLogin.login(AbstractLogin.java:60)
>   at 
> org.apache.kafka.common.security.kerberos.KerberosLogin.login(KerberosLogin.java:103)
>   at 
> org.apache.kafka.common.security.authenticator.LoginManager.(LoginManager.java:61)
>   at 
> org.apache.kafka.common.security.authenticator.LoginManager.acquireLoginManager(LoginManager.java:104)
> 

[jira] [Resolved] (SPARK-29414) HasOutputCol param isSet() property is not preserved after persistence

2019-10-25 Thread Bryan Cutler (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler resolved SPARK-29414.
--
Fix Version/s: 2.4.4
   Resolution: Fixed

Thanks [~borys.biletskyy], I'll mark this as resolved for 2.4.4 then.

> HasOutputCol param isSet() property is not preserved after persistence
> --
>
> Key: SPARK-29414
> URL: https://issues.apache.org/jira/browse/SPARK-29414
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 2.3.2
>Reporter: Borys Biletskyy
>Priority: Major
> Fix For: 2.4.4
>
>
> HasOutputCol param isSet() property is not preserved after saving and loading 
> using DefaultParamsReadable and DefaultParamsWritable.
> {code:java}
> import pytest
> from pyspark import keyword_only
> from pyspark.ml import Model
> from pyspark.sql import DataFrame
> from pyspark.ml.util import DefaultParamsReadable, DefaultParamsWritable
> from pyspark.ml.param.shared import HasInputCol, HasOutputCol
> from pyspark.sql.functions import *
> class HasOutputColTester(Model,
>  HasInputCol,
>  HasOutputCol,
>  DefaultParamsReadable,
>  DefaultParamsWritable
>  ):
> @keyword_only
> def __init__(self, inputCol: str = None, outputCol: str = None):
> super(HasOutputColTester, self).__init__()
> kwargs = self._input_kwargs
> self.setParams(**kwargs)
> @keyword_only
> def setParams(self, inputCol: str = None, outputCol: str = None):
> kwargs = self._input_kwargs
> self._set(**kwargs)
> return self
> def _transform(self, data: DataFrame) -> DataFrame:
> return data
> class TestHasInputColParam(object):
> def test_persist_input_col_set(self, spark, temp_dir):
> path = temp_dir + '/test_model'
> model = HasOutputColTester()
> assert not model.isDefined(model.inputCol)
> assert not model.isSet(model.inputCol)
> assert model.isDefined(model.outputCol)
> assert not model.isSet(model.outputCol)
> model.write().overwrite().save(path)
> loaded_model: HasOutputColTester = HasOutputColTester.load(path)
> assert not loaded_model.isDefined(model.inputCol)
> assert not loaded_model.isSet(model.inputCol)
> assert loaded_model.isDefined(model.outputCol)
> assert not loaded_model.isSet(model.outputCol)  # AssertionError: 
> assert not True
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29508) Implicitly cast strings in datetime arithmetic operations

2019-10-25 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-29508.
--
Resolution: Won't Fix

> Implicitly cast strings in datetime arithmetic operations
> -
>
> Key: SPARK-29508
> URL: https://issues.apache.org/jira/browse/SPARK-29508
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Maxim Gekk
>Priority: Minor
>
> To improve Spark SQL UX, strings can be cast to the `INTERVAL` or `TIMESTAMP` 
> types in the cases:
>  # Cast string to interval in interval - string
>  # Cast string to interval in datetime + string or string + datetime
>  # Cast string to timestamp in datetime - string or string - datetime



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29414) HasOutputCol param isSet() property is not preserved after persistence

2019-10-25 Thread Borys Biletskyy (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16959874#comment-16959874
 ] 

Borys Biletskyy commented on SPARK-29414:
-

The test passes in v. 2.4.4 and persistence works as expected.

> HasOutputCol param isSet() property is not preserved after persistence
> --
>
> Key: SPARK-29414
> URL: https://issues.apache.org/jira/browse/SPARK-29414
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 2.3.2
>Reporter: Borys Biletskyy
>Priority: Major
>
> HasOutputCol param isSet() property is not preserved after saving and loading 
> using DefaultParamsReadable and DefaultParamsWritable.
> {code:java}
> import pytest
> from pyspark import keyword_only
> from pyspark.ml import Model
> from pyspark.sql import DataFrame
> from pyspark.ml.util import DefaultParamsReadable, DefaultParamsWritable
> from pyspark.ml.param.shared import HasInputCol, HasOutputCol
> from pyspark.sql.functions import *
> class HasOutputColTester(Model,
>  HasInputCol,
>  HasOutputCol,
>  DefaultParamsReadable,
>  DefaultParamsWritable
>  ):
> @keyword_only
> def __init__(self, inputCol: str = None, outputCol: str = None):
> super(HasOutputColTester, self).__init__()
> kwargs = self._input_kwargs
> self.setParams(**kwargs)
> @keyword_only
> def setParams(self, inputCol: str = None, outputCol: str = None):
> kwargs = self._input_kwargs
> self._set(**kwargs)
> return self
> def _transform(self, data: DataFrame) -> DataFrame:
> return data
> class TestHasInputColParam(object):
> def test_persist_input_col_set(self, spark, temp_dir):
> path = temp_dir + '/test_model'
> model = HasOutputColTester()
> assert not model.isDefined(model.inputCol)
> assert not model.isSet(model.inputCol)
> assert model.isDefined(model.outputCol)
> assert not model.isSet(model.outputCol)
> model.write().overwrite().save(path)
> loaded_model: HasOutputColTester = HasOutputColTester.load(path)
> assert not loaded_model.isDefined(model.inputCol)
> assert not loaded_model.isSet(model.inputCol)
> assert loaded_model.isDefined(model.outputCol)
> assert not loaded_model.isSet(model.outputCol)  # AssertionError: 
> assert not True
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29606) Improve EliminateOuterJoin performance

2019-10-25 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-29606:
---

 Summary: Improve EliminateOuterJoin performance
 Key: SPARK-29606
 URL: https://issues.apache.org/jira/browse/SPARK-29606
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yuming Wang


How to reproduce:
{code:scala}

sql(
  """
|CREATE TABLE `big_table1`(`adj_type_id` tinyint, `byr_cntry_id` 
decimal(4,0), `sap_category_id` decimal(9,0), `lstg_site_id` decimal(9,0), 
`lstg_type_code` decimal(4,0), `offrd_slng_chnl_grp_id` smallint, 
`slr_cntry_id` decimal(4,0), `sold_slng_chnl_grp_id` smallint, `bin_lstg_yn_id` 
tinyint, `bin_sold_yn_id` tinyint, `lstg_curncy_id` decimal(4,0), 
`blng_curncy_id` decimal(4,0), `bid_count` decimal(18,0), `ck_trans_count` 
decimal(18,0), `ended_bid_count` decimal(18,0), `new_lstg_count` decimal(18,0), 
`ended_lstg_count` decimal(18,0), `ended_success_lstg_count` decimal(18,0), 
`item_sold_count` decimal(18,0), `gmv_us_amt` decimal(18,2), `gmv_byr_lc_amt` 
decimal(18,2), `gmv_slr_lc_amt` decimal(18,2), `gmv_lstg_curncy_amt` 
decimal(18,2), `gmv_us_m_amt` decimal(18,2), `rvnu_insrtn_fee_us_amt` 
decimal(18,6), `rvnu_insrtn_fee_lc_amt` decimal(18,6), `rvnu_insrtn_fee_bc_amt` 
decimal(18,6), `rvnu_insrtn_fee_us_m_amt` decimal(18,6), 
`rvnu_insrtn_crd_us_amt` decimal(18,6), `rvnu_insrtn_crd_lc_amt` decimal(18,6), 
`rvnu_insrtn_crd_bc_amt` decimal(18,6), `rvnu_insrtn_crd_us_m_amt` 
decimal(18,6), `rvnu_fetr_fee_us_amt` decimal(18,6), `rvnu_fetr_fee_lc_amt` 
decimal(18,6), `rvnu_fetr_fee_bc_amt` decimal(18,6), `rvnu_fetr_fee_us_m_amt` 
decimal(18,6), `rvnu_fetr_crd_us_amt` decimal(18,6), `rvnu_fetr_crd_lc_amt` 
decimal(18,6), `rvnu_fetr_crd_bc_amt` decimal(18,6), `rvnu_fetr_crd_us_m_amt` 
decimal(18,6), `rvnu_fv_fee_us_amt` decimal(18,6), `rvnu_fv_fee_slr_lc_amt` 
decimal(18,6), `rvnu_fv_fee_byr_lc_amt` decimal(18,6), `rvnu_fv_fee_bc_amt` 
decimal(18,6), `rvnu_fv_fee_us_m_amt` decimal(18,6), `rvnu_fv_crd_us_amt` 
decimal(18,6), `rvnu_fv_crd_byr_lc_amt` decimal(18,6), `rvnu_fv_crd_slr_lc_amt` 
decimal(18,6), `rvnu_fv_crd_bc_amt` decimal(18,6), `rvnu_fv_crd_us_m_amt` 
decimal(18,6), `rvnu_othr_l_fee_us_amt` decimal(18,6), `rvnu_othr_l_fee_lc_amt` 
decimal(18,6), `rvnu_othr_l_fee_bc_amt` decimal(18,6), 
`rvnu_othr_l_fee_us_m_amt` decimal(18,6), `rvnu_othr_l_crd_us_amt` 
decimal(18,6), `rvnu_othr_l_crd_lc_amt` decimal(18,6), `rvnu_othr_l_crd_bc_amt` 
decimal(18,6), `rvnu_othr_l_crd_us_m_amt` decimal(18,6), 
`rvnu_othr_nl_fee_us_amt` decimal(18,6), `rvnu_othr_nl_fee_lc_amt` 
decimal(18,6), `rvnu_othr_nl_fee_bc_amt` decimal(18,6), 
`rvnu_othr_nl_fee_us_m_amt` decimal(18,6), `rvnu_othr_nl_crd_us_amt` 
decimal(18,6), `rvnu_othr_nl_crd_lc_amt` decimal(18,6), 
`rvnu_othr_nl_crd_bc_amt` decimal(18,6), `rvnu_othr_nl_crd_us_m_amt` 
decimal(18,6), `rvnu_slr_tools_fee_us_amt` decimal(18,6), 
`rvnu_slr_tools_fee_lc_amt` decimal(18,6), `rvnu_slr_tools_fee_bc_amt` 
decimal(18,6), `rvnu_slr_tools_fee_us_m_amt` decimal(18,6), 
`rvnu_slr_tools_crd_us_amt` decimal(18,6), `rvnu_slr_tools_crd_lc_amt` 
decimal(18,6), `rvnu_slr_tools_crd_bc_amt` decimal(18,6), 
`rvnu_slr_tools_crd_us_m_amt` decimal(18,6), `rvnu_unasgnd_us_amt` 
decimal(18,6), `rvnu_unasgnd_lc_amt` decimal(18,6), `rvnu_unasgnd_bc_amt` 
decimal(18,6), `rvnu_unasgnd_us_m_amt` decimal(18,6), `rvnu_ad_fee_us_amt` 
decimal(18,6), `rvnu_ad_fee_lc_amt` decimal(18,6), `rvnu_ad_fee_bc_amt` 
decimal(18,6), `rvnu_ad_fee_us_m_amt` decimal(18,6), `rvnu_ad_crd_us_amt` 
decimal(18,6), `rvnu_ad_crd_lc_amt` decimal(18,6), `rvnu_ad_crd_bc_amt` 
decimal(18,6), `rvnu_ad_crd_us_m_amt` decimal(18,6), `rvnu_othr_ad_fee_us_amt` 
decimal(18,6), `rvnu_othr_ad_fee_lc_amt` decimal(18,6), 
`rvnu_othr_ad_fee_bc_amt` decimal(18,6), `rvnu_othr_ad_fee_us_m_amt` 
decimal(18,6), `cre_date` date, `cre_user` string, `upd_date` timestamp, 
`upd_user` string, `cmn_mtrc_summ_dt` date)
|USING parquet
PARTITIONED BY (`cmn_mtrc_summ_dt`)
|""".stripMargin)

sql(
  """
|CREATE TABLE `small_table1` (`CURNCY_ID` DECIMAL(9,0), 
`CURNCY_PLAN_RATE` DECIMAL(18,6), `CRE_DATE` DATE, `CRE_USER` STRING, 
`UPD_DATE` TIMESTAMP, `UPD_USER` STRING)
|USING parquet
|CLUSTERED BY (CURNCY_ID)
|SORTED BY (CURNCY_ID)
|INTO 1 BUCKETS
|""".stripMargin)


sql(
  """
|CREATE TABLE `small_table2` (`cntry_id` DECIMAL(4,0), `curncy_id` 
DECIMAL(4,0), `cntry_desc` STRING, `cntry_code` STRING, `iso_cntry_code` 
STRING, `cultural` STRING, `cntry_busn_unit` STRING, `high_vol_cntry_yn_id` 
TINYINT, `check_sil` TINYINT, `rev_rollup_id` SMALLINT, `rev_rollup` STRING, 
`prft_cntr_id` INT, `prft_cntr` STRING, `cre_date` DATE, `upd_date` TIMESTAMP, 
`cre_user` STRING, `upd_user` STRING)
|USING parquet
|""".stripMargin)

sql(
  """
|CREATE TABLE `small_table3` (`rev_rollup_id` SMALLINT,

[jira] [Commented] (SPARK-28743) YarnShuffleService leads to NodeManager OOM because ChannelOutboundBuffer has too many entries

2019-10-25 Thread Anton Ippolitov (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16959858#comment-16959858
 ] 

Anton Ippolitov commented on SPARK-28743:
-

Hi!

We are seeing the exact same issue with Spark 2.4.4. More specifically, the 
issue arises only for a handful of our Spark jobs and only when we enable 
transport encryption ({{spark.network.crypto.enabled}} set to {{true)}}. We are 
able to consistently reproduce the problem: i.e. the NodeManager OOMs every 
time we launch these particular jobs. When transport encryption is disabled, we 
don't see this issue anymore.

I have tried bumping the NodeManager's memory via {{YARN_NODEMANAGER_HEAPSIZE}} 
: I set it to 8GB, 16GB and 32GB but the NodeManager OOMs every time.

I captured a couple of thread dumps from the NodeManager and they look very 
similar to the one posted by [~yangjiandan]. (see screenshot)

Would anyone have any insight into this issue? I would be happy to provide more 
information if needed.

 

  !Screen Shot 2019-10-25 at 17.24.10.png!

 

> YarnShuffleService leads to NodeManager OOM because ChannelOutboundBuffer has 
> too many entries
> --
>
> Key: SPARK-28743
> URL: https://issues.apache.org/jira/browse/SPARK-28743
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 2.3.0
>Reporter: Jiandan Yang 
>Priority: Major
> Attachments: Screen Shot 2019-10-25 at 17.24.10.png, dominator.jpg, 
> histo.jpg
>
>
> NodeManager heap size is 4G, io.netty.channel.ChannelOutboundBuffer$Entry 
> occupied about 2.8G by looking at Histogram of Mat, and those Entries were 
> hold by ChannelOutboundBuffer by looking at dominator_tree of mat. By 
> analyzing  one fo ChannelOutboundBuffer object, I found there were 248867 
> entries in the object of ChannelOutboundBuffer 
> (ChannelOutboundBuffer#flushed=248867), and  
> ChannelOutboundBuffer#totalPengdingSize=23891232 which is more than 
> highwaterMark(64K), and unwritable=1 meaning sending buffer was full.  But 
> ChannelHandler seems not check unwritable flag when write message, and 
> finally NodeManager occurs OOM.
> Histogram:
> !histo.jpg!
> dominator_tree:
> !dominator.jpg!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28743) YarnShuffleService leads to NodeManager OOM because ChannelOutboundBuffer has too many entries

2019-10-25 Thread Anton Ippolitov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Ippolitov updated SPARK-28743:

Attachment: Screen Shot 2019-10-25 at 17.24.10.png

> YarnShuffleService leads to NodeManager OOM because ChannelOutboundBuffer has 
> too many entries
> --
>
> Key: SPARK-28743
> URL: https://issues.apache.org/jira/browse/SPARK-28743
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 2.3.0
>Reporter: Jiandan Yang 
>Priority: Major
> Attachments: Screen Shot 2019-10-25 at 17.24.10.png, dominator.jpg, 
> histo.jpg
>
>
> NodeManager heap size is 4G, io.netty.channel.ChannelOutboundBuffer$Entry 
> occupied about 2.8G by looking at Histogram of Mat, and those Entries were 
> hold by ChannelOutboundBuffer by looking at dominator_tree of mat. By 
> analyzing  one fo ChannelOutboundBuffer object, I found there were 248867 
> entries in the object of ChannelOutboundBuffer 
> (ChannelOutboundBuffer#flushed=248867), and  
> ChannelOutboundBuffer#totalPengdingSize=23891232 which is more than 
> highwaterMark(64K), and unwritable=1 meaning sending buffer was full.  But 
> ChannelHandler seems not check unwritable flag when write message, and 
> finally NodeManager occurs OOM.
> Histogram:
> !histo.jpg!
> dominator_tree:
> !dominator.jpg!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29605) Optimize string to interval casting

2019-10-25 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29605:
--

 Summary: Optimize string to interval casting
 Key: SPARK-29605
 URL: https://issues.apache.org/jira/browse/SPARK-29605
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


Implement new function stringToInterval in IntervalUtils to cast a value of 
UTF8String to an instance of CalendarInterval that should be faster than 
existing implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29574) spark with user provided hadoop doesn't work on kubernetes

2019-10-25 Thread Jira


[ 
https://issues.apache.org/jira/browse/SPARK-29574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16959849#comment-16959849
 ] 

Michał Wesołowski commented on SPARK-29574:
---

I investigated the executor issue. It doesn't handle SPARK_DIST_CLASSPATH 
environment variable because in kubernetes it is simply  
{color:#172b4d}org.apache.spark.executor.CoarseGrainedExecutorBackend invoked 
that does not respect it. For executor to "see" user provided hadoop 
dependencies I modified entrypoint script so in case of SPARK_K8S_CMD executor 
it would specify classpath with $SPARK_DIST_CLASSPATH{color}
{code:java}
...
executor)
  CMD=(
${JAVA_HOME}/bin/java
"${SPARK_EXECUTOR_JAVA_OPTS[@]}"
-Xms$SPARK_EXECUTOR_MEMORY
-Xmx$SPARK_EXECUTOR_MEMORY
-cp "$SPARK_CLASSPATH:$SPARK_DIST_CLASSPATH"
org.apache.spark.executor.CoarseGrainedExecutorBackend
--driver-url $SPARK_DRIVER_URL
--executor-id $SPARK_EXECUTOR_ID
--cores $SPARK_EXECUTOR_CORES
--app-id $SPARK_APPLICATION_ID
--hostname $SPARK_EXECUTOR_POD_IP
  ) {code}
So there are two problems:

Driver doesn't see environment variables from $SPARK_HOME/conf/spark-env.sh  
because this gets hidden by mounted config map, and executor doesn't take into 
account $SPARK_DIST_CLASSPATH at all. 

> spark with user provided hadoop doesn't work on kubernetes
> --
>
> Key: SPARK-29574
> URL: https://issues.apache.org/jira/browse/SPARK-29574
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.4
>Reporter: Michał Wesołowski
>Priority: Major
>
> When spark-submit is run with image built with "hadoop free" spark and user 
> provided hadoop it fails on kubernetes (hadoop libraries are not on spark's 
> classpath). 
> I downloaded spark [Pre-built with user-provided Apache 
> Hadoop|https://www-us.apache.org/dist/spark/spark-2.4.4/spark-2.4.4-bin-without-hadoop.tgz].
>  
> I created docker image with usage of 
> [docker-image-tool.sh|[https://github.com/apache/spark/blob/master/bin/docker-image-tool.sh]].
>  
>  
> Based on this image (2.4.4-without-hadoop)
> I created another one with Dockerfile
> {code:java}
> FROM spark-py:2.4.4-without-hadoop
> ENV SPARK_HOME=/opt/spark/
> # This is needed for newer kubernetes versions
> ADD 
> https://repo1.maven.org/maven2/io/fabric8/kubernetes-client/4.6.1/kubernetes-client-4.6.1.jar
>  $SPARK_HOME/jars
> COPY spark-env.sh /opt/spark/conf/spark-env.sh
> RUN chmod +x /opt/spark/conf/spark-env.sh
> RUN wget -qO- 
> https://www-eu.apache.org/dist/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz 
> | tar xz  -C /opt/
> ENV HADOOP_HOME=/opt/hadoop-3.2.1
> ENV PATH=${HADOOP_HOME}/bin:${PATH}
> {code}
> Contents of spark-env.sh:
> {code:java}
> #!/usr/bin/env bash
> export SPARK_DIST_CLASSPATH=$(hadoop 
> classpath):$HADOOP_HOME/share/hadoop/tools/lib/*
> export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native
> {code}
> spark-submit run with image crated this way fails since spark-env.sh is 
> overwritten by [volume created when pod 
> starts|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala#L108]
> As quick workaround I tried to modify [entrypoint 
> script|https://github.com/apache/spark/blob/ea8b5df47476fe66b63bd7f7bcd15acfb80bde78/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh]
>  to run spark-env.sh during startup and moving spark-env.sh to a different 
> directory. 
>  Driver starts without issues in this setup however, evethough 
> SPARK_DIST_CLASSPATH is set executor is constantly failing:
> {code:java}
> PS 
> C:\Sandbox\projekty\roboticdrive-analytics\components\docker-images\spark-rda>
>  kubectl logs rda-script-1571835692837-exec-12
> ++ id -u
> + myuid=0
> ++ id -g
> + mygid=0
> + set +e
> ++ getent passwd 0
> + uidentry=root:x:0:0:root:/root:/bin/ash
> + set -e
> + '[' -z root:x:0:0:root:/root:/bin/ash ']'
> + source /opt/spark-env.sh
> +++ hadoop classpath
> ++ export 
> 'SPARK_DIST_CLASSPATH=/opt/hadoop-3.2.1/etc/hadoop:/opt/hadoop-3.2.1/share/hadoop/common/lib/*:/opt/hadoop-3.2.1/share/hadoop/common/*:/opt/hadoop-3.2.1/share/hadoop/hdfs:/opt/hadoop-3.2.1/share/hadoop/hdfs/lib/*:/opt/hadoop-3.2.1/share/hadoop/hdfs/*:/opt/hadoop-3.2.1/share/hadoop/mapreduce/lib/*:/opt/hadoop-3.2.1/share/hadoop/mapreduce/*:/opt/hadoop-3.2.1/share/hadoo++
>  
> SPARK_DIST_CLASSPATH='/opt/hadoop-3.2.1/etc/hadoop:/opt/hadoop-3.2.1/share/hadoop/common/lib/*:/opt/hadoop-3.2.1/share/hadoop/common/*:/opt/hadoop-3.2.1/share/hadoop/hdfs:/opt/hadoop-3.2.1/share/hadoop/hdfs/lib/*:/opt/hadoop-3.2.1/share/hadoop/hdfs/*:/opt/hadoop-3.2.1/share/hadoop/mapreduce/lib/*:/opt/hadoop-3.2.1/share/hadoop/mapreduce/*:/opt/hadoop-3.2.1/share/hadoop/yar

[jira] [Resolved] (SPARK-29527) SHOW CREATE TABLE should look up catalog/table like v2 commands

2019-10-25 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-29527.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26184
[https://github.com/apache/spark/pull/26184]

> SHOW CREATE TABLE should look up catalog/table like v2 commands
> ---
>
> Key: SPARK-29527
> URL: https://issues.apache.org/jira/browse/SPARK-29527
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
> Fix For: 3.0.0
>
>
> SHOW CREATE TABLE should look up catalog/table like v2 commands



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29545) Implement bitwise integer aggregates bit_xor

2019-10-25 Thread Takeshi Yamamuro (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro resolved SPARK-29545.
--
Resolution: Fixed

Resolved by [https://github.com/apache/spark/pull/26205]

> Implement bitwise integer aggregates bit_xor
> 
>
> Key: SPARK-29545
> URL: https://issues.apache.org/jira/browse/SPARK-29545
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.0.0
>
>
> {code:java}
>  {code}
> As we support bit_and, bit_or now, we'd better support the related aggregate 
> function bit_xor ahead of postgreSQL, because many other popular databases 
> support it.
>  
> [http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.help.sqlanywhere.12.0.1/dbreference/bit-xor-function.html]
> [https://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_bit-or]
> [https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/Aggregate/BIT_XOR.htm?TocPath=SQL%20Reference%20Manual%7CSQL%20Functions%7CAggregate%20Functions%7C_10]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29500) Support partition column when writing to Kafka

2019-10-25 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-29500:


Assignee: Nicola Bova

> Support partition column when writing to Kafka
> --
>
> Key: SPARK-29500
> URL: https://issues.apache.org/jira/browse/SPARK-29500
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Structured Streaming
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Nicola Bova
>Assignee: Nicola Bova
>Priority: Major
>  Labels: starter
>
> When writing to a Kafka topic, `KafkaWriter` does not support selecting the 
> ouput kafka partition through a DataFrame column.
> While it is possible to configure a custom Kafka Partitioner with 
>  `.option({color:#6a8759}"kafka.partitioner.class"{color}{color:#cc7832}, 
> {color}{color:#6a8759}"my.custom.Partitioner"{color})`, this is not enough 
> for certain use cases.
> After the introduction of GDPR, it is a common pattern to emit records with 
> unique Kafka keys, thus allowing to tombstone individual records.
> This strategy implies that the totality of the key information cannot be used 
> to calculate the topic partition and users need to resort to custom 
> partitioners.
> However, as stated at 
> [https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#kafka-specific-configurations],
>  "Keys/Values are always serialized with ByteArraySerializer or 
> StringSerializer. Use DataFrame operations to explicitly serialize 
> keys/values into either strings or byte arrays."
> Therefore, a custom partitioner would need to
>  - deserialize the key (or value)
>  - calculate the output partition using a subset of the key (or value) fields
> This is inefficient because it requires an unnecessary deserialization step. 
> It also makes it impossible to use Spark batch writer to send Kafka 
> tombstones when the partition is calculated from a subset of the kafka value.
> It would be a nice addition to let the user choose a partition by setting a 
> value in the "partition" column of the dataframe, as already done for 
> `topic`, `key`, `value`, and `headers` in `KafkaWriter`, also mirroring the 
> `ProducerRecord` API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29500) Support partition column when writing to Kafka

2019-10-25 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-29500.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26153
[https://github.com/apache/spark/pull/26153]

> Support partition column when writing to Kafka
> --
>
> Key: SPARK-29500
> URL: https://issues.apache.org/jira/browse/SPARK-29500
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Structured Streaming
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Nicola Bova
>Assignee: Nicola Bova
>Priority: Major
>  Labels: starter
> Fix For: 3.0.0
>
>
> When writing to a Kafka topic, `KafkaWriter` does not support selecting the 
> ouput kafka partition through a DataFrame column.
> While it is possible to configure a custom Kafka Partitioner with 
>  `.option({color:#6a8759}"kafka.partitioner.class"{color}{color:#cc7832}, 
> {color}{color:#6a8759}"my.custom.Partitioner"{color})`, this is not enough 
> for certain use cases.
> After the introduction of GDPR, it is a common pattern to emit records with 
> unique Kafka keys, thus allowing to tombstone individual records.
> This strategy implies that the totality of the key information cannot be used 
> to calculate the topic partition and users need to resort to custom 
> partitioners.
> However, as stated at 
> [https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#kafka-specific-configurations],
>  "Keys/Values are always serialized with ByteArraySerializer or 
> StringSerializer. Use DataFrame operations to explicitly serialize 
> keys/values into either strings or byte arrays."
> Therefore, a custom partitioner would need to
>  - deserialize the key (or value)
>  - calculate the output partition using a subset of the key (or value) fields
> This is inefficient because it requires an unnecessary deserialization step. 
> It also makes it impossible to use Spark batch writer to send Kafka 
> tombstones when the partition is calculated from a subset of the kafka value.
> It would be a nice addition to let the user choose a partition by setting a 
> value in the "partition" column of the dataframe, as already done for 
> `topic`, `key`, `value`, and `headers` in `KafkaWriter`, also mirroring the 
> `ProducerRecord` API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29604) SessionState is initialized with isolated classloader for Hive if spark.sql.hive.metastore.jars is being set

2019-10-25 Thread Jungtaek Lim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16959711#comment-16959711
 ] 

Jungtaek Lim commented on SPARK-29604:
--

I've figured out the root cause and have a patch. Will submit a patch soon. I 
may need some more time to craft a relevant test.

> SessionState is initialized with isolated classloader for Hive if 
> spark.sql.hive.metastore.jars is being set
> 
>
> Key: SPARK-29604
> URL: https://issues.apache.org/jira/browse/SPARK-29604
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> I've observed the issue that external listeners cannot be loaded properly 
> when we run spark-sql with "spark.sql.hive.metastore.jars" configuration 
> being used.
> {noformat}
> Exception in thread "main" java.lang.IllegalArgumentException: Error while 
> instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder':
>   at 
> org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1102)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:154)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:153)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:153)
>   at 
> org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:150)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$2.apply(SparkSession.scala:104)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$2.apply(SparkSession.scala:104)
>   at scala.Option.map(Option.scala:146)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:104)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:103)
>   at org.apache.spark.sql.internal.SQLConf$.get(SQLConf.scala:149)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.org$apache$spark$sql$hive$client$HiveClientImpl$$client(HiveClientImpl.scala:282)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:306)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:247)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:246)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:296)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.databaseExists(HiveClientImpl.scala:386)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:215)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:215)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:215)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:214)
>   at 
> org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:114)
>   at 
> org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:102)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:53)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:315)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:166)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:847)
>   at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
>   at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
>   at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
>   at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.d

[jira] [Created] (SPARK-29604) SessionState is initialized with isolated classloader for Hive if spark.sql.hive.metastore.jars is being set

2019-10-25 Thread Jungtaek Lim (Jira)
Jungtaek Lim created SPARK-29604:


 Summary: SessionState is initialized with isolated classloader for 
Hive if spark.sql.hive.metastore.jars is being set
 Key: SPARK-29604
 URL: https://issues.apache.org/jira/browse/SPARK-29604
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.4, 3.0.0
Reporter: Jungtaek Lim


I've observed the issue that external listeners cannot be loaded properly when 
we run spark-sql with "spark.sql.hive.metastore.jars" configuration being used.

{noformat}
Exception in thread "main" java.lang.IllegalArgumentException: Error while 
instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder':
at 
org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1102)
at 
org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:154)
at 
org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:153)
at scala.Option.getOrElse(Option.scala:121)
at 
org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:153)
at 
org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:150)
at 
org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$2.apply(SparkSession.scala:104)
at 
org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$2.apply(SparkSession.scala:104)
at scala.Option.map(Option.scala:146)
at 
org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:104)
at 
org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:103)
at org.apache.spark.sql.internal.SQLConf$.get(SQLConf.scala:149)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.org$apache$spark$sql$hive$client$HiveClientImpl$$client(HiveClientImpl.scala:282)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:306)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:247)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:246)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:296)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.databaseExists(HiveClientImpl.scala:386)
at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:215)
at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:215)
at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:215)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:214)
at 
org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:114)
at 
org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:102)
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:53)
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:315)
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:166)
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:847)
at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:922)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:931)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.spark.SparkException: Exception when registering 
StreamingQueryListener
at 
org.apache.spark.sql.streaming.StreamingQueryManager.(StreamingQueryManager.scala:70)
at 
org.apache.spark.sql.internal.BaseSessionStateBuilder.streamingQueryManager(BaseSessionStateBuilder.scala:260)
at 
org.apache.spar

[jira] [Created] (SPARK-29603) Support application priority for spark on yarn

2019-10-25 Thread Kent Yao (Jira)
Kent Yao created SPARK-29603:


 Summary: Support application priority for spark on yarn
 Key: SPARK-29603
 URL: https://issues.apache.org/jira/browse/SPARK-29603
 Project: Spark
  Issue Type: Improvement
  Components: YARN
Affects Versions: 3.0.0
Reporter: Kent Yao


We can set priority to an application for YARN to define pending applications 
ordering policy, those with higher priority have a better opportunity to be 
activated. YARN CapacityScheduler only.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-9612) Add instance weight support for GBTs

2019-10-25 Thread zhengruifeng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-9612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng reopened SPARK-9612:
-
  Assignee: zhengruifeng  (was: DB Tsai)

> Add instance weight support for GBTs
> 
>
> Key: SPARK-9612
> URL: https://issues.apache.org/jira/browse/SPARK-9612
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Reporter: Joseph K. Bradley
>Assignee: zhengruifeng
>Priority: Minor
>  Labels: bulk-closed
>
> GBT support for instance weights could be handled by:
> * sampling data before passing it to trees
> * passing weights to trees (requiring weight support for trees first, but 
> probably better in the end)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-9612) Add instance weight support for GBTs

2019-10-25 Thread zhengruifeng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-9612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng resolved SPARK-9612.
-
Resolution: Fixed

> Add instance weight support for GBTs
> 
>
> Key: SPARK-9612
> URL: https://issues.apache.org/jira/browse/SPARK-9612
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Reporter: Joseph K. Bradley
>Assignee: zhengruifeng
>Priority: Minor
>  Labels: bulk-closed
>
> GBT support for instance weights could be handled by:
> * sampling data before passing it to trees
> * passing weights to trees (requiring weight support for trees first, but 
> probably better in the end)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29599) Support pagination for session table in JDBC/ODBC Tab

2019-10-25 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-29599:
--
Summary: Support pagination for session table in JDBC/ODBC Tab   (was: 
Support pagination for session table in JDBC/ODBC Session page )

> Support pagination for session table in JDBC/ODBC Tab 
> --
>
> Key: SPARK-29599
> URL: https://issues.apache.org/jira/browse/SPARK-29599
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 2.4.0, 3.0.0
>Reporter: angerszhu
>Priority: Minor
>
> Support pagination for session table in JDBC/ODBC Session page 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29602) How does the spark from_json json and dataframe transform ignore the case of the json key

2019-10-25 Thread ruiliang (Jira)
ruiliang created SPARK-29602:


 Summary: How does the spark from_json json and dataframe transform 
ignore the case of the json key
 Key: SPARK-29602
 URL: https://issues.apache.org/jira/browse/SPARK-29602
 Project: Spark
  Issue Type: Question
  Components: Spark Core
Affects Versions: 2.4.4
Reporter: ruiliang


How does the spark from_json json and dataframe transform ignore the case of 
the json key

code 
{code:java}
def main(args: Array[String]): Unit = {
  val spark = SparkSession.builder().master("local[*]").
enableHiveSupport().getOrCreate()
  //spark.sqlContext.setConf("spark.sql.caseSensitive", "false")
  import spark.implicits._
  //hive table  data Lower case automatically when saving
  val hivetable =

"""{"deliverysystype":"dms","orderid":"B0001-N103-000-005882-RL3AI2RWCP","storeid":103,"timestamp":1571587522000,"":"dms"}"""
  val hiveDF = Seq(hivetable).toDF("msg")
  val rdd = hiveDF.rdd.map(_.getString(0))
  val jsonDataDF = spark.read.json(rdd.toDS())
  jsonDataDF.show(false)
  
//++---++---+-+
  //||deliverysystype|orderid |storeid|timestamp
|
  
//++---++---+-+
  //|dms |dms|B0001-N103-000-005882-RL3AI2RWCP|103
|1571587522000|
  
//++---++---+-+
  val jsonstr =
  
"""{"data":{"deliverySysType":"dms","orderId":"B0001-N103-000-005882-RL3AI2RWCP","storeId":103,"timestamp":1571587522000},"accessKey":"f9d069861dfb1678","actionName":"candao.rider.getDeliveryInfo","sign":"fa0239c75e065cf43d0a4040665578ba"
 }"""
  val jsonStrDF = Seq(jsonstr).toDF("msg")
  //转换json数据列 action_nameactionName
  jsonStrDF.show(false)
  val structSeqSchme = StructType(Seq(StructField("data", jsonDataDF.schema, 
true),
StructField("accessKey", StringType, true), //这里应该 accessKey
StructField("actionName", StringType, true),
StructField("columnNameOfCorruptRecord", StringType, true)
  ))
  //hive col name lower case, json data key capital and small letter,Take less 
than value
  val mapOption = Map("allowBackslashEscapingAnyCharacter" -> "true", 
"allowUnquotedControlChars" -> "true", "allowSingleQuotes" -> "true")
  //I'm not doing anything here, but I don't know how to set a value, right?
  val newDF = jsonStrDF.withColumn("data_col", from_json(col("msg"), 
structSeqSchme, mapOption))
  newDF.show(false)
  newDF.printSchema()
  newDF.select($"data_col.accessKey", $"data_col.actionName", 
$"data_col.data.*", $"data_col.columnNameOfCorruptRecord").show(false)
  //Lowercase columns do not fetch data. How do you make it ignore lowercase 
columns?  deliverysystype,storeid-> null
  
//++++---+---+---+-+-+
  //|accessKey   |actionName  
||deliverysystype|orderid|storeid|timestamp|columnNameOfCorruptRecord|
  
//++++---+---+---+-+-+
  //|f9d069861dfb1678|candao.rider.getDeliveryInfo|null|null   |null   
|null   |1571587522000|null |
  
//++++---+---+---+-+-+
}
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29580) KafkaDelegationTokenSuite fails to create new KafkaAdminClient

2019-10-25 Thread Gabor Somogyi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16959512#comment-16959512
 ] 

Gabor Somogyi commented on SPARK-29580:
---

I think `It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector` is because the 
issue happens in `beforeAll`.
 I've tried to reproduce it but no luck until now. Additionally I've taken a 
look at the jenkins logs but doesn't contain why this happened.
 I think we should add further debug log information in a PR and then
 * trying to reproduce it further
 * wait on jenkins and take a look at the logs when happens again

> KafkaDelegationTokenSuite fails to create new KafkaAdminClient
> --
>
> Key: SPARK-29580
> URL: https://issues.apache.org/jira/browse/SPARK-29580
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> - 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112562/testReport/org.apache.spark.sql.kafka010/KafkaDelegationTokenSuite/_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_/
> {code}
> sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: Failed to 
> create new KafkaAdminClient
>   at 
> org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:407)
>   at 
> org.apache.kafka.clients.admin.AdminClient.create(AdminClient.java:55)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:227)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:249)
>   at 
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49)
>   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
>   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
>   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:56)
>   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317)
>   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: 
> javax.security.auth.login.LoginException: Server not found in Kerberos 
> database (7) - Server not found in Kerberos database
>   at 
> org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:160)
>   at 
> org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:146)
>   at 
> org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:67)
>   at 
> org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:99)
>   at 
> org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:382)
>   ... 16 more
> Caused by: sbt.ForkMain$ForkError: javax.security.auth.login.LoginException: 
> Server not found in Kerberos database (7) - Server not found in Kerberos 
> database
>   at 
> com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:804)
>   at 
> com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
>   at 
> javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
>   at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
>   at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at 
> javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
>   at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
>   at 
> org.apache.kafka.common.security.authenticator.AbstractLogin.login(AbstractLogin.java:60)
>   at 
> org.apache.kafka.common.security.kerberos.KerberosLogin.logi

[jira] [Resolved] (SPARK-29461) Spark dataframe writer does not expose metrics for JDBC writer

2019-10-25 Thread Takeshi Yamamuro (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro resolved SPARK-29461.
--
Fix Version/s: 3.0.0
 Assignee: Jungtaek Lim
   Resolution: Fixed

Resolved by [https://github.com/apache/spark/pull/26109]

> Spark dataframe writer does not expose metrics for JDBC writer 
> ---
>
> Key: SPARK-29461
> URL: https://issues.apache.org/jira/browse/SPARK-29461
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: ROHIT KALHANS
>Assignee: Jungtaek Lim
>Priority: Minor
> Fix For: 3.0.0
>
>
> Spark does not expose the writer metrics when using the Dataframe JDBC 
> writer. Similar instances of such bugs have been fixed in previous versions. 
> However, it seems the fix was not exhaustive since it does not cover all the 
> writers. 
> Similar bugs: 
> https://issues.apache.org/jira/browse/SPARK-21882
> https://issues.apache.org/jira/browse/SPARK-22605
>  
>  
> Console reporter output 
>  app-name.1.executor.bytesWritten
>              count = 0 
>  
>  app-name.1.executor.recordsWritten
>              count = 0  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org