[jira] [Assigned] (SPARK-29490) Reset 'WritableColumnVector' in 'RowToColumnarExec'
[ https://issues.apache.org/jira/browse/SPARK-29490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-29490: - Assignee: Rong Ma > Reset 'WritableColumnVector' in 'RowToColumnarExec' > --- > > Key: SPARK-29490 > URL: https://issues.apache.org/jira/browse/SPARK-29490 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Rong Ma >Assignee: Rong Ma >Priority: Major > > When converting {{Iterator[InternalRow]}} to {{Iterator[ColumnarBatch]}}, the > vectors used to create a new {{ColumnarBatch}} should be reset in the > iterator's "next()" method. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29490) Reset 'WritableColumnVector' in 'RowToColumnarExec'
[ https://issues.apache.org/jira/browse/SPARK-29490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-29490. --- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26137 [https://github.com/apache/spark/pull/26137] > Reset 'WritableColumnVector' in 'RowToColumnarExec' > --- > > Key: SPARK-29490 > URL: https://issues.apache.org/jira/browse/SPARK-29490 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Rong Ma >Assignee: Rong Ma >Priority: Major > Fix For: 3.0.0 > > > When converting {{Iterator[InternalRow]}} to {{Iterator[ColumnarBatch]}}, the > vectors used to create a new {{ColumnarBatch}} should be reset in the > iterator's "next()" method. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29554) Add `version` SQL function
[ https://issues.apache.org/jira/browse/SPARK-29554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-29554: -- Description: |string|version()|Returns the Hive version (as of Hive 2.1.0). The string contains 2 fields, the first being a build number and the second being a build hash. Example: "select version();" might return "2.1.0 r027527b9c5ce1a3d7d0b6d2e6de2378fb0c39232". Actual results will depend on your build.| [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF] was: |string|version()|Returns the Hive version (as of Hive 2.1.0). The string contains 2 fields, the first being a build number and the second being a build hash. Example: "select version();" might return "2.1.0.2.5.0.0-1245 r027527b9c5ce1a3d7d0b6d2e6de2378fb0c39232". Actual results will depend on your build.| [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF] > Add `version` SQL function > -- > > Key: SPARK-29554 > URL: https://issues.apache.org/jira/browse/SPARK-29554 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Minor > Fix For: 3.0.0 > > > |string|version()|Returns the Hive version (as of Hive 2.1.0). The string > contains 2 fields, the first being a build number and the second being a > build hash. Example: "select version();" might return "2.1.0 > r027527b9c5ce1a3d7d0b6d2e6de2378fb0c39232". Actual results will depend on > your build.| > > > [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-29554) Add a misc function named version
[ https://issues.apache.org/jira/browse/SPARK-29554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-29554: - Assignee: Kent Yao > Add a misc function named version > - > > Key: SPARK-29554 > URL: https://issues.apache.org/jira/browse/SPARK-29554 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Minor > > |string|version()|Returns the Hive version (as of Hive 2.1.0). The string > contains 2 fields, the first being a build number and the second being a > build hash. Example: "select version();" might return "2.1.0.2.5.0.0-1245 > r027527b9c5ce1a3d7d0b6d2e6de2378fb0c39232". Actual results will depend on > your build.| > > > [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29554) Add a misc function named version
[ https://issues.apache.org/jira/browse/SPARK-29554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-29554. --- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26209 [https://github.com/apache/spark/pull/26209] > Add a misc function named version > - > > Key: SPARK-29554 > URL: https://issues.apache.org/jira/browse/SPARK-29554 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Minor > Fix For: 3.0.0 > > > |string|version()|Returns the Hive version (as of Hive 2.1.0). The string > contains 2 fields, the first being a build number and the second being a > build hash. Example: "select version();" might return "2.1.0.2.5.0.0-1245 > r027527b9c5ce1a3d7d0b6d2e6de2378fb0c39232". Actual results will depend on > your build.| > > > [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29554) Add `version` SQL function
[ https://issues.apache.org/jira/browse/SPARK-29554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-29554: -- Summary: Add `version` SQL function (was: Add a misc function named version) > Add `version` SQL function > -- > > Key: SPARK-29554 > URL: https://issues.apache.org/jira/browse/SPARK-29554 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Minor > Fix For: 3.0.0 > > > |string|version()|Returns the Hive version (as of Hive 2.1.0). The string > contains 2 fields, the first being a build number and the second being a > build hash. Example: "select version();" might return "2.1.0.2.5.0.0-1245 > r027527b9c5ce1a3d7d0b6d2e6de2378fb0c39232". Actual results will depend on > your build.| > > > [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29610) Keys with Null values are discarded when using to_json function
Jonathan created SPARK-29610: Summary: Keys with Null values are discarded when using to_json function Key: SPARK-29610 URL: https://issues.apache.org/jira/browse/SPARK-29610 Project: Spark Issue Type: Bug Components: Build Affects Versions: 2.4.4 Reporter: Jonathan When calling to_json on a Struct if a key has Null as a value then the key is thrown away. {code:java} import pyspark import pyspark.sql.functions as F l = [("a", "foo"), ("b", None)] df = spark.createDataFrame(l, ["id", "data"]) ( df.select(F.struct("*").alias("payload")) .withColumn("payload", F.to_json(F.col("payload")) ).select("payload") .show() ){code} Produces the following output: {noformat} ++ | payload| ++ |{"id":"a","data":...| | {"id":"b"}| ++{noformat} The `data` key in the second row has just been silently deleted. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29033) Always use CreateNamedStructUnsafe, the UnsafeRow-based version of the CreateNamedStruct codepath
[ https://issues.apache.org/jira/browse/SPARK-29033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-29033. Resolution: Won't Fix I'm resolving this as "Won't Fix" because we ended up doing the opposite in SPARK-29503, removing CreateNamedStructUnsafe because its sole use was associated with a correctness bug. > Always use CreateNamedStructUnsafe, the UnsafeRow-based version of the > CreateNamedStruct codepath > - > > Key: SPARK-29033 > URL: https://issues.apache.org/jira/browse/SPARK-29033 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Josh Rosen >Assignee: Josh Rosen >Priority: Major > > Spark 2.x has two separate implementations of the "create named struct" > expression: regular {{CreateNamedStruct}} and {{CreateNamedStructUnsafe}}. > The "unsafe" version was added in SPARK-9373 to support structs in > {{GenerateUnsafeProjection}}. These two expressions both extend the > {{CreateNameStructLike}} trait. > For Spark 3.0, I propose to always use the "unsafe" code path: this will > avoid object allocation / boxing inefficiencies in the "safe" codepath, which > is an especially big problem when generating Encoders for deeply-nested case > classes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-27736) Improve handling of FetchFailures caused by ExternalShuffleService losing track of executor registrations
[ https://issues.apache.org/jira/browse/SPARK-27736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16960244#comment-16960244 ] feiwang edited comment on SPARK-27736 at 10/26/19 2:30 AM: --- Hi, we met this issue recently. [~joshrosen] [~tgraves] How about implementing a simple solution: * Let externalShuffleClient can query whether a executor is registered in ESS * when FetchFailedException thrown, check whether this executor is registered in ESS * if not, we should remove all outputs of executors that are not registered on this host. If it is Ok, I can implement it. was (Author: hzfeiwang): Hi, we met this issue recently. [~joshrosen] [~tgraves] How about implementing a simple solution: * Let externalShuffleClient can query whether a executor is registered in ESS * when remove executor, check whether this executor is registered in ESS * if not, we should remove all outputs of executors that are not registered on this host. If it is Ok, I can implement it. > Improve handling of FetchFailures caused by ExternalShuffleService losing > track of executor registrations > - > > Key: SPARK-27736 > URL: https://issues.apache.org/jira/browse/SPARK-27736 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 2.4.0 >Reporter: Josh Rosen >Priority: Minor > > This ticket describes a fault-tolerance edge-case which can cause Spark jobs > to fail if a single external shuffle service process reboots and fails to > recover the list of registered executors (something which can happen when > using YARN if NodeManager recovery is disabled) _and_ the Spark job has a > large number of executors per host. > I believe this problem can be worked around today via a change of > configurations, but I'm filing this issue to (a) better document this > problem, and (b) propose either a change of default configurations or > additional DAGScheduler logic to better handle this failure mode. > h2. Problem description > The external shuffle service process is _mostly_ stateless except for a map > tracking the set of registered applications and executors. > When processing a shuffle fetch request, the shuffle services first checks > whether the requested block ID's executor is registered; if it's not > registered then the shuffle service throws an exception like > {code:java} > java.lang.RuntimeException: Executor is not registered > (appId=application_1557557221330_6891, execId=428){code} > and this exception becomes a {{FetchFailed}} error in the executor requesting > the shuffle block. > In normal operation this error should not occur because executors shouldn't > be mis-routing shuffle fetch requests. However, this _can_ happen if the > shuffle service crashes and restarts, causing it to lose its in-memory > executor registration state. With YARN this state can be recovered from disk > if YARN NodeManager recovery is enabled (using the mechanism added in > SPARK-9439), but I don't believe that we perform state recovery in Standalone > and Mesos modes (see SPARK-24223). > If state cannot be recovered then map outputs cannot be served (even though > the files probably still exist on disk). In theory, this shouldn't cause > Spark jobs to fail because we can always redundantly recompute lost / > unfetchable map outputs. > However, in practice this can cause total job failures in deployments where > the node with the failed shuffle service was running a large number of > executors: by default, the DAGScheduler unregisters map outputs _only from > individual executor whose shuffle blocks could not be fetched_ (see > [code|https://github.com/apache/spark/blame/bfb3ffe9b33a403a1f3b6f5407d34a477ce62c85/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1643]), > so it can take several rounds of failed stage attempts to fail and clear > output from all executors on the faulty host. If the number of executors on a > host is greater than the stage retry limit then this can exhaust stage retry > attempts and cause job failures. > This "multiple rounds of recomputation to discover all failed executors on a > host" problem was addressed by SPARK-19753, which added a > {{spark.files.fetchFailure.unRegisterOutputOnHost}} configuration which > promotes executor fetch failures into host-wide fetch failures (clearing > output from all neighboring executors upon a single failure). However, that > configuration is {{false}} by default. > h2. Potential solutions > I have a few ideas about how we can improve this situation: > - Update the [YARN external shuffle service > documentation|https://spark.apache.org/docs/latest/running-on-yarn.html#configuring-the-external-shuffle-service] >
[jira] [Commented] (SPARK-27736) Improve handling of FetchFailures caused by ExternalShuffleService losing track of executor registrations
[ https://issues.apache.org/jira/browse/SPARK-27736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16960244#comment-16960244 ] feiwang commented on SPARK-27736: - Hi, we met this issue recently. [~joshrosen] [~tgraves] How about implementing a simple solution: * Let externalShuffleClient can query whether a executor is registered in ESS * when remove executor, check whether this executor is registered in ESS * if not, we should remove all outputs of executors that are not registered on this host. If it is Ok, I can implement it. > Improve handling of FetchFailures caused by ExternalShuffleService losing > track of executor registrations > - > > Key: SPARK-27736 > URL: https://issues.apache.org/jira/browse/SPARK-27736 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 2.4.0 >Reporter: Josh Rosen >Priority: Minor > > This ticket describes a fault-tolerance edge-case which can cause Spark jobs > to fail if a single external shuffle service process reboots and fails to > recover the list of registered executors (something which can happen when > using YARN if NodeManager recovery is disabled) _and_ the Spark job has a > large number of executors per host. > I believe this problem can be worked around today via a change of > configurations, but I'm filing this issue to (a) better document this > problem, and (b) propose either a change of default configurations or > additional DAGScheduler logic to better handle this failure mode. > h2. Problem description > The external shuffle service process is _mostly_ stateless except for a map > tracking the set of registered applications and executors. > When processing a shuffle fetch request, the shuffle services first checks > whether the requested block ID's executor is registered; if it's not > registered then the shuffle service throws an exception like > {code:java} > java.lang.RuntimeException: Executor is not registered > (appId=application_1557557221330_6891, execId=428){code} > and this exception becomes a {{FetchFailed}} error in the executor requesting > the shuffle block. > In normal operation this error should not occur because executors shouldn't > be mis-routing shuffle fetch requests. However, this _can_ happen if the > shuffle service crashes and restarts, causing it to lose its in-memory > executor registration state. With YARN this state can be recovered from disk > if YARN NodeManager recovery is enabled (using the mechanism added in > SPARK-9439), but I don't believe that we perform state recovery in Standalone > and Mesos modes (see SPARK-24223). > If state cannot be recovered then map outputs cannot be served (even though > the files probably still exist on disk). In theory, this shouldn't cause > Spark jobs to fail because we can always redundantly recompute lost / > unfetchable map outputs. > However, in practice this can cause total job failures in deployments where > the node with the failed shuffle service was running a large number of > executors: by default, the DAGScheduler unregisters map outputs _only from > individual executor whose shuffle blocks could not be fetched_ (see > [code|https://github.com/apache/spark/blame/bfb3ffe9b33a403a1f3b6f5407d34a477ce62c85/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1643]), > so it can take several rounds of failed stage attempts to fail and clear > output from all executors on the faulty host. If the number of executors on a > host is greater than the stage retry limit then this can exhaust stage retry > attempts and cause job failures. > This "multiple rounds of recomputation to discover all failed executors on a > host" problem was addressed by SPARK-19753, which added a > {{spark.files.fetchFailure.unRegisterOutputOnHost}} configuration which > promotes executor fetch failures into host-wide fetch failures (clearing > output from all neighboring executors upon a single failure). However, that > configuration is {{false}} by default. > h2. Potential solutions > I have a few ideas about how we can improve this situation: > - Update the [YARN external shuffle service > documentation|https://spark.apache.org/docs/latest/running-on-yarn.html#configuring-the-external-shuffle-service] > to recommend enabling node manager recovery. > - Consider defaulting {{spark.files.fetchFailure.unRegisterOutputOnHost}} to > {{true}}. This would improve out-of-the-box resiliency for large clusters. > The trade-off here is a reduction of efficiency in case there are transient > "false positive" fetch failures, but I suspect this case may be unlikely in > practice (so the change of default could be an acceptable trade-off). See > [prior discussion on >
[jira] [Updated] (SPARK-29608) Add Hadoop 3.2 profile to binary package
[ https://issues.apache.org/jira/browse/SPARK-29608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-29608: -- Issue Type: Improvement (was: Task) > Add Hadoop 3.2 profile to binary package > > > Key: SPARK-29608 > URL: https://issues.apache.org/jira/browse/SPARK-29608 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29608) Add Hadoop 3.2 profile to binary package
Dongjoon Hyun created SPARK-29608: - Summary: Add Hadoop 3.2 profile to binary package Key: SPARK-29608 URL: https://issues.apache.org/jira/browse/SPARK-29608 Project: Spark Issue Type: Task Components: Build Affects Versions: 3.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29608) Add Hadoop 3.2 profile to binary package
[ https://issues.apache.org/jira/browse/SPARK-29608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-29608. --- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26260 [https://github.com/apache/spark/pull/26260] > Add Hadoop 3.2 profile to binary package > > > Key: SPARK-29608 > URL: https://issues.apache.org/jira/browse/SPARK-29608 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25906) spark-shell cannot handle `-i` option correctly
[ https://issues.apache.org/jira/browse/SPARK-25906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-25906: -- Fix Version/s: (was: 3.0.0) > spark-shell cannot handle `-i` option correctly > --- > > Key: SPARK-25906 > URL: https://issues.apache.org/jira/browse/SPARK-25906 > Project: Spark > Issue Type: Bug > Components: Spark Shell, SQL >Affects Versions: 2.4.0 >Reporter: Dongjoon Hyun >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 2.4.1 > > > This is a regression on Spark 2.4.0. > *Spark 2.3.2* > {code:java} > $ cat test.scala > spark.version > case class Record(key: Int, value: String) > spark.sparkContext.parallelize((1 to 2).map(i => Record(i, > s"val_$i"))).toDF.show > $ bin/spark-shell -i test.scala > 18/10/31 23:22:43 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > Setting default log level to "WARN". > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use > setLogLevel(newLevel). > Spark context Web UI available at http://localhost:4040 > Spark context available as 'sc' (master = local[*], app id = > local-1541053368478). > Spark session available as 'spark'. > Loading test.scala... > res0: String = 2.3.2 > defined class Record > 18/10/31 23:22:56 WARN ObjectStore: Failed to get database global_temp, > returning NoSuchObjectException > +---+-+ > |key|value| > +---+-+ > | 1|val_1| > | 2|val_2| > +---+-+ > {code} > *Spark 2.4.0 RC5* > {code:java} > $ bin/spark-shell -i test.scala > 2018-10-31 23:23:14 WARN NativeCodeLoader:62 - Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > Setting default log level to "WARN". > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use > setLogLevel(newLevel). > Spark context Web UI available at http://localhost:4040 > Spark context available as 'sc' (master = local[*], app id = > local-1541053400312). > Spark session available as 'spark'. > test.scala:17: error: value toDF is not a member of > org.apache.spark.rdd.RDD[Record] > Error occurred in an application involving default arguments. >spark.sparkContext.parallelize((1 to 2).map(i => Record(i, > s"val_$i"))).toDF.show > {code} > *WORKAROUND* > Add the following line at the first of the script. > {code} > import spark.implicits._ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29607) Move static methods from CalendarInterval to IntervalUtils
Maxim Gekk created SPARK-29607: -- Summary: Move static methods from CalendarInterval to IntervalUtils Key: SPARK-29607 URL: https://issues.apache.org/jira/browse/SPARK-29607 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.4 Reporter: Maxim Gekk Move static methods from the CalendarInterval class to the helper object IntervalUtils. Need to rewrite Java code to Scala code. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-29608) Add Hadoop 3.2 profile to binary package
[ https://issues.apache.org/jira/browse/SPARK-29608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-29608: - Assignee: Dongjoon Hyun > Add Hadoop 3.2 profile to binary package > > > Key: SPARK-29608 > URL: https://issues.apache.org/jira/browse/SPARK-29608 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29609) DataSourceV2: Support DROP NAMESPACE
Terry Kim created SPARK-29609: - Summary: DataSourceV2: Support DROP NAMESPACE Key: SPARK-29609 URL: https://issues.apache.org/jira/browse/SPARK-29609 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Terry Kim DROP NAMESPACE needs to support v2 catalogs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29580) Add kerberos debug messages for Kafka secure tests
[ https://issues.apache.org/jira/browse/SPARK-29580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16960077#comment-16960077 ] Dongjoon Hyun commented on SPARK-29580: --- We will open a new JIRA if this happens again. For now, the PR is the best effort we can do. > Add kerberos debug messages for Kafka secure tests > -- > > Key: SPARK-29580 > URL: https://issues.apache.org/jira/browse/SPARK-29580 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Gabor Somogyi >Priority: Major > Fix For: 3.0.0 > > > - > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112562/testReport/org.apache.spark.sql.kafka010/KafkaDelegationTokenSuite/_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_/ > {code} > sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: Failed to > create new KafkaAdminClient > at > org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:407) > at > org.apache.kafka.clients.admin.AdminClient.create(AdminClient.java:55) > at > org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:227) > at > org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:249) > at > org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49) > at > org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212) > at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) > at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) > at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:56) > at > org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317) > at > org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510) > at sbt.ForkMain$Run$2.call(ForkMain.java:296) > at sbt.ForkMain$Run$2.call(ForkMain.java:286) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: > javax.security.auth.login.LoginException: Server not found in Kerberos > database (7) - Server not found in Kerberos database > at > org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:160) > at > org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:146) > at > org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:67) > at > org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:99) > at > org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:382) > ... 16 more > Caused by: sbt.ForkMain$ForkError: javax.security.auth.login.LoginException: > Server not found in Kerberos database (7) - Server not found in Kerberos > database > at > com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:804) > at > com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) > at > javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) > at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) > at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) > at java.security.AccessController.doPrivileged(Native Method) > at > javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680) > at javax.security.auth.login.LoginContext.login(LoginContext.java:587) > at > org.apache.kafka.common.security.authenticator.AbstractLogin.login(AbstractLogin.java:60) > at > org.apache.kafka.common.security.kerberos.KerberosLogin.login(KerberosLogin.java:103) > at > org.apache.kafka.common.security.authenticator.LoginManager.(LoginManager.java:61) > at > org.apache.kafka.common.security.authenticator.LoginManager.acquireLoginManager(LoginManager.java:104) > at > org.apache.kafka.common.net
[jira] [Updated] (SPARK-29580) Add kerberos debug messages for Kafka secure tests
[ https://issues.apache.org/jira/browse/SPARK-29580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-29580: -- Summary: Add kerberos debug messages for Kafka secure tests (was: KafkaDelegationTokenSuite fails to create new KafkaAdminClient) > Add kerberos debug messages for Kafka secure tests > -- > > Key: SPARK-29580 > URL: https://issues.apache.org/jira/browse/SPARK-29580 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Gabor Somogyi >Priority: Major > Fix For: 3.0.0 > > > - > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112562/testReport/org.apache.spark.sql.kafka010/KafkaDelegationTokenSuite/_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_/ > {code} > sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: Failed to > create new KafkaAdminClient > at > org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:407) > at > org.apache.kafka.clients.admin.AdminClient.create(AdminClient.java:55) > at > org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:227) > at > org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:249) > at > org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49) > at > org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212) > at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) > at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) > at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:56) > at > org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317) > at > org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510) > at sbt.ForkMain$Run$2.call(ForkMain.java:296) > at sbt.ForkMain$Run$2.call(ForkMain.java:286) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: > javax.security.auth.login.LoginException: Server not found in Kerberos > database (7) - Server not found in Kerberos database > at > org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:160) > at > org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:146) > at > org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:67) > at > org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:99) > at > org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:382) > ... 16 more > Caused by: sbt.ForkMain$ForkError: javax.security.auth.login.LoginException: > Server not found in Kerberos database (7) - Server not found in Kerberos > database > at > com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:804) > at > com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) > at > javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) > at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) > at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) > at java.security.AccessController.doPrivileged(Native Method) > at > javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680) > at javax.security.auth.login.LoginContext.login(LoginContext.java:587) > at > org.apache.kafka.common.security.authenticator.AbstractLogin.login(AbstractLogin.java:60) > at > org.apache.kafka.common.security.kerberos.KerberosLogin.login(KerberosLogin.java:103) > at > org.apache.kafka.common.security.authenticator.LoginManager.(LoginManager.java:61) > at > org.apache.kafka.common.security.authenticator.LoginManager.acquireLoginManager(LoginManager.java:104) > at > org.apache.kafka.common.network.SaslChannel
[jira] [Resolved] (SPARK-29580) KafkaDelegationTokenSuite fails to create new KafkaAdminClient
[ https://issues.apache.org/jira/browse/SPARK-29580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-29580. --- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26252 [https://github.com/apache/spark/pull/26252] > KafkaDelegationTokenSuite fails to create new KafkaAdminClient > -- > > Key: SPARK-29580 > URL: https://issues.apache.org/jira/browse/SPARK-29580 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Gabor Somogyi >Priority: Major > Fix For: 3.0.0 > > > - > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112562/testReport/org.apache.spark.sql.kafka010/KafkaDelegationTokenSuite/_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_/ > {code} > sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: Failed to > create new KafkaAdminClient > at > org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:407) > at > org.apache.kafka.clients.admin.AdminClient.create(AdminClient.java:55) > at > org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:227) > at > org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:249) > at > org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49) > at > org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212) > at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) > at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) > at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:56) > at > org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317) > at > org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510) > at sbt.ForkMain$Run$2.call(ForkMain.java:296) > at sbt.ForkMain$Run$2.call(ForkMain.java:286) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: > javax.security.auth.login.LoginException: Server not found in Kerberos > database (7) - Server not found in Kerberos database > at > org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:160) > at > org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:146) > at > org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:67) > at > org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:99) > at > org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:382) > ... 16 more > Caused by: sbt.ForkMain$ForkError: javax.security.auth.login.LoginException: > Server not found in Kerberos database (7) - Server not found in Kerberos > database > at > com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:804) > at > com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) > at > javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) > at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) > at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) > at java.security.AccessController.doPrivileged(Native Method) > at > javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680) > at javax.security.auth.login.LoginContext.login(LoginContext.java:587) > at > org.apache.kafka.common.security.authenticator.AbstractLogin.login(AbstractLogin.java:60) > at > org.apache.kafka.common.security.kerberos.KerberosLogin.login(KerberosLogin.java:103) > at > org.apache.kafka.common.security.authenticator.LoginManager.(LoginManager.java:61) > at > org.apache.kafka.common.security.authenticator.LoginManager.acquireLoginManager(LoginManager.java:104) > at > org.apache.kafka.com
[jira] [Assigned] (SPARK-29580) KafkaDelegationTokenSuite fails to create new KafkaAdminClient
[ https://issues.apache.org/jira/browse/SPARK-29580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-29580: - Assignee: Gabor Somogyi > KafkaDelegationTokenSuite fails to create new KafkaAdminClient > -- > > Key: SPARK-29580 > URL: https://issues.apache.org/jira/browse/SPARK-29580 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Gabor Somogyi >Priority: Major > > - > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112562/testReport/org.apache.spark.sql.kafka010/KafkaDelegationTokenSuite/_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_/ > {code} > sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: Failed to > create new KafkaAdminClient > at > org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:407) > at > org.apache.kafka.clients.admin.AdminClient.create(AdminClient.java:55) > at > org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:227) > at > org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:249) > at > org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49) > at > org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212) > at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) > at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) > at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:56) > at > org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317) > at > org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510) > at sbt.ForkMain$Run$2.call(ForkMain.java:296) > at sbt.ForkMain$Run$2.call(ForkMain.java:286) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: > javax.security.auth.login.LoginException: Server not found in Kerberos > database (7) - Server not found in Kerberos database > at > org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:160) > at > org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:146) > at > org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:67) > at > org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:99) > at > org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:382) > ... 16 more > Caused by: sbt.ForkMain$ForkError: javax.security.auth.login.LoginException: > Server not found in Kerberos database (7) - Server not found in Kerberos > database > at > com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:804) > at > com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) > at > javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) > at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) > at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) > at java.security.AccessController.doPrivileged(Native Method) > at > javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680) > at javax.security.auth.login.LoginContext.login(LoginContext.java:587) > at > org.apache.kafka.common.security.authenticator.AbstractLogin.login(AbstractLogin.java:60) > at > org.apache.kafka.common.security.kerberos.KerberosLogin.login(KerberosLogin.java:103) > at > org.apache.kafka.common.security.authenticator.LoginManager.(LoginManager.java:61) > at > org.apache.kafka.common.security.authenticator.LoginManager.acquireLoginManager(LoginManager.java:104) > at > org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:149) > ... 20 more > Caused by: sbt.ForkMain$ForkError: sun.
[jira] [Commented] (SPARK-29580) KafkaDelegationTokenSuite fails to create new KafkaAdminClient
[ https://issues.apache.org/jira/browse/SPARK-29580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16959962#comment-16959962 ] Dongjoon Hyun commented on SPARK-29580: --- Thank you for taking a look, [~gsomogyi]. +1 for waiting for the next failure. If that doesn't happen frequently, it's okay to leave this AS-IS status. Your time is precious. > KafkaDelegationTokenSuite fails to create new KafkaAdminClient > -- > > Key: SPARK-29580 > URL: https://issues.apache.org/jira/browse/SPARK-29580 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > - > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112562/testReport/org.apache.spark.sql.kafka010/KafkaDelegationTokenSuite/_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_/ > {code} > sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: Failed to > create new KafkaAdminClient > at > org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:407) > at > org.apache.kafka.clients.admin.AdminClient.create(AdminClient.java:55) > at > org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:227) > at > org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:249) > at > org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49) > at > org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212) > at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) > at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) > at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:56) > at > org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317) > at > org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510) > at sbt.ForkMain$Run$2.call(ForkMain.java:296) > at sbt.ForkMain$Run$2.call(ForkMain.java:286) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: > javax.security.auth.login.LoginException: Server not found in Kerberos > database (7) - Server not found in Kerberos database > at > org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:160) > at > org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:146) > at > org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:67) > at > org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:99) > at > org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:382) > ... 16 more > Caused by: sbt.ForkMain$ForkError: javax.security.auth.login.LoginException: > Server not found in Kerberos database (7) - Server not found in Kerberos > database > at > com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:804) > at > com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) > at > javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) > at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) > at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) > at java.security.AccessController.doPrivileged(Native Method) > at > javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680) > at javax.security.auth.login.LoginContext.login(LoginContext.java:587) > at > org.apache.kafka.common.security.authenticator.AbstractLogin.login(AbstractLogin.java:60) > at > org.apache.kafka.common.security.kerberos.KerberosLogin.login(KerberosLogin.java:103) > at > org.apache.kafka.common.security.authenticator.LoginManager.(LoginManager.java:61) > at > org.apache.kafka.common.security.authenticator.LoginManager.acquireLoginManager(LoginManager.java:104) >
[jira] [Resolved] (SPARK-29414) HasOutputCol param isSet() property is not preserved after persistence
[ https://issues.apache.org/jira/browse/SPARK-29414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-29414. -- Fix Version/s: 2.4.4 Resolution: Fixed Thanks [~borys.biletskyy], I'll mark this as resolved for 2.4.4 then. > HasOutputCol param isSet() property is not preserved after persistence > -- > > Key: SPARK-29414 > URL: https://issues.apache.org/jira/browse/SPARK-29414 > Project: Spark > Issue Type: Bug > Components: ML, PySpark >Affects Versions: 2.3.2 >Reporter: Borys Biletskyy >Priority: Major > Fix For: 2.4.4 > > > HasOutputCol param isSet() property is not preserved after saving and loading > using DefaultParamsReadable and DefaultParamsWritable. > {code:java} > import pytest > from pyspark import keyword_only > from pyspark.ml import Model > from pyspark.sql import DataFrame > from pyspark.ml.util import DefaultParamsReadable, DefaultParamsWritable > from pyspark.ml.param.shared import HasInputCol, HasOutputCol > from pyspark.sql.functions import * > class HasOutputColTester(Model, > HasInputCol, > HasOutputCol, > DefaultParamsReadable, > DefaultParamsWritable > ): > @keyword_only > def __init__(self, inputCol: str = None, outputCol: str = None): > super(HasOutputColTester, self).__init__() > kwargs = self._input_kwargs > self.setParams(**kwargs) > @keyword_only > def setParams(self, inputCol: str = None, outputCol: str = None): > kwargs = self._input_kwargs > self._set(**kwargs) > return self > def _transform(self, data: DataFrame) -> DataFrame: > return data > class TestHasInputColParam(object): > def test_persist_input_col_set(self, spark, temp_dir): > path = temp_dir + '/test_model' > model = HasOutputColTester() > assert not model.isDefined(model.inputCol) > assert not model.isSet(model.inputCol) > assert model.isDefined(model.outputCol) > assert not model.isSet(model.outputCol) > model.write().overwrite().save(path) > loaded_model: HasOutputColTester = HasOutputColTester.load(path) > assert not loaded_model.isDefined(model.inputCol) > assert not loaded_model.isSet(model.inputCol) > assert loaded_model.isDefined(model.outputCol) > assert not loaded_model.isSet(model.outputCol) # AssertionError: > assert not True > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29508) Implicitly cast strings in datetime arithmetic operations
[ https://issues.apache.org/jira/browse/SPARK-29508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-29508. -- Resolution: Won't Fix > Implicitly cast strings in datetime arithmetic operations > - > > Key: SPARK-29508 > URL: https://issues.apache.org/jira/browse/SPARK-29508 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.4 >Reporter: Maxim Gekk >Priority: Minor > > To improve Spark SQL UX, strings can be cast to the `INTERVAL` or `TIMESTAMP` > types in the cases: > # Cast string to interval in interval - string > # Cast string to interval in datetime + string or string + datetime > # Cast string to timestamp in datetime - string or string - datetime -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29414) HasOutputCol param isSet() property is not preserved after persistence
[ https://issues.apache.org/jira/browse/SPARK-29414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16959874#comment-16959874 ] Borys Biletskyy commented on SPARK-29414: - The test passes in v. 2.4.4 and persistence works as expected. > HasOutputCol param isSet() property is not preserved after persistence > -- > > Key: SPARK-29414 > URL: https://issues.apache.org/jira/browse/SPARK-29414 > Project: Spark > Issue Type: Bug > Components: ML, PySpark >Affects Versions: 2.3.2 >Reporter: Borys Biletskyy >Priority: Major > > HasOutputCol param isSet() property is not preserved after saving and loading > using DefaultParamsReadable and DefaultParamsWritable. > {code:java} > import pytest > from pyspark import keyword_only > from pyspark.ml import Model > from pyspark.sql import DataFrame > from pyspark.ml.util import DefaultParamsReadable, DefaultParamsWritable > from pyspark.ml.param.shared import HasInputCol, HasOutputCol > from pyspark.sql.functions import * > class HasOutputColTester(Model, > HasInputCol, > HasOutputCol, > DefaultParamsReadable, > DefaultParamsWritable > ): > @keyword_only > def __init__(self, inputCol: str = None, outputCol: str = None): > super(HasOutputColTester, self).__init__() > kwargs = self._input_kwargs > self.setParams(**kwargs) > @keyword_only > def setParams(self, inputCol: str = None, outputCol: str = None): > kwargs = self._input_kwargs > self._set(**kwargs) > return self > def _transform(self, data: DataFrame) -> DataFrame: > return data > class TestHasInputColParam(object): > def test_persist_input_col_set(self, spark, temp_dir): > path = temp_dir + '/test_model' > model = HasOutputColTester() > assert not model.isDefined(model.inputCol) > assert not model.isSet(model.inputCol) > assert model.isDefined(model.outputCol) > assert not model.isSet(model.outputCol) > model.write().overwrite().save(path) > loaded_model: HasOutputColTester = HasOutputColTester.load(path) > assert not loaded_model.isDefined(model.inputCol) > assert not loaded_model.isSet(model.inputCol) > assert loaded_model.isDefined(model.outputCol) > assert not loaded_model.isSet(model.outputCol) # AssertionError: > assert not True > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29606) Improve EliminateOuterJoin performance
Yuming Wang created SPARK-29606: --- Summary: Improve EliminateOuterJoin performance Key: SPARK-29606 URL: https://issues.apache.org/jira/browse/SPARK-29606 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Yuming Wang How to reproduce: {code:scala} sql( """ |CREATE TABLE `big_table1`(`adj_type_id` tinyint, `byr_cntry_id` decimal(4,0), `sap_category_id` decimal(9,0), `lstg_site_id` decimal(9,0), `lstg_type_code` decimal(4,0), `offrd_slng_chnl_grp_id` smallint, `slr_cntry_id` decimal(4,0), `sold_slng_chnl_grp_id` smallint, `bin_lstg_yn_id` tinyint, `bin_sold_yn_id` tinyint, `lstg_curncy_id` decimal(4,0), `blng_curncy_id` decimal(4,0), `bid_count` decimal(18,0), `ck_trans_count` decimal(18,0), `ended_bid_count` decimal(18,0), `new_lstg_count` decimal(18,0), `ended_lstg_count` decimal(18,0), `ended_success_lstg_count` decimal(18,0), `item_sold_count` decimal(18,0), `gmv_us_amt` decimal(18,2), `gmv_byr_lc_amt` decimal(18,2), `gmv_slr_lc_amt` decimal(18,2), `gmv_lstg_curncy_amt` decimal(18,2), `gmv_us_m_amt` decimal(18,2), `rvnu_insrtn_fee_us_amt` decimal(18,6), `rvnu_insrtn_fee_lc_amt` decimal(18,6), `rvnu_insrtn_fee_bc_amt` decimal(18,6), `rvnu_insrtn_fee_us_m_amt` decimal(18,6), `rvnu_insrtn_crd_us_amt` decimal(18,6), `rvnu_insrtn_crd_lc_amt` decimal(18,6), `rvnu_insrtn_crd_bc_amt` decimal(18,6), `rvnu_insrtn_crd_us_m_amt` decimal(18,6), `rvnu_fetr_fee_us_amt` decimal(18,6), `rvnu_fetr_fee_lc_amt` decimal(18,6), `rvnu_fetr_fee_bc_amt` decimal(18,6), `rvnu_fetr_fee_us_m_amt` decimal(18,6), `rvnu_fetr_crd_us_amt` decimal(18,6), `rvnu_fetr_crd_lc_amt` decimal(18,6), `rvnu_fetr_crd_bc_amt` decimal(18,6), `rvnu_fetr_crd_us_m_amt` decimal(18,6), `rvnu_fv_fee_us_amt` decimal(18,6), `rvnu_fv_fee_slr_lc_amt` decimal(18,6), `rvnu_fv_fee_byr_lc_amt` decimal(18,6), `rvnu_fv_fee_bc_amt` decimal(18,6), `rvnu_fv_fee_us_m_amt` decimal(18,6), `rvnu_fv_crd_us_amt` decimal(18,6), `rvnu_fv_crd_byr_lc_amt` decimal(18,6), `rvnu_fv_crd_slr_lc_amt` decimal(18,6), `rvnu_fv_crd_bc_amt` decimal(18,6), `rvnu_fv_crd_us_m_amt` decimal(18,6), `rvnu_othr_l_fee_us_amt` decimal(18,6), `rvnu_othr_l_fee_lc_amt` decimal(18,6), `rvnu_othr_l_fee_bc_amt` decimal(18,6), `rvnu_othr_l_fee_us_m_amt` decimal(18,6), `rvnu_othr_l_crd_us_amt` decimal(18,6), `rvnu_othr_l_crd_lc_amt` decimal(18,6), `rvnu_othr_l_crd_bc_amt` decimal(18,6), `rvnu_othr_l_crd_us_m_amt` decimal(18,6), `rvnu_othr_nl_fee_us_amt` decimal(18,6), `rvnu_othr_nl_fee_lc_amt` decimal(18,6), `rvnu_othr_nl_fee_bc_amt` decimal(18,6), `rvnu_othr_nl_fee_us_m_amt` decimal(18,6), `rvnu_othr_nl_crd_us_amt` decimal(18,6), `rvnu_othr_nl_crd_lc_amt` decimal(18,6), `rvnu_othr_nl_crd_bc_amt` decimal(18,6), `rvnu_othr_nl_crd_us_m_amt` decimal(18,6), `rvnu_slr_tools_fee_us_amt` decimal(18,6), `rvnu_slr_tools_fee_lc_amt` decimal(18,6), `rvnu_slr_tools_fee_bc_amt` decimal(18,6), `rvnu_slr_tools_fee_us_m_amt` decimal(18,6), `rvnu_slr_tools_crd_us_amt` decimal(18,6), `rvnu_slr_tools_crd_lc_amt` decimal(18,6), `rvnu_slr_tools_crd_bc_amt` decimal(18,6), `rvnu_slr_tools_crd_us_m_amt` decimal(18,6), `rvnu_unasgnd_us_amt` decimal(18,6), `rvnu_unasgnd_lc_amt` decimal(18,6), `rvnu_unasgnd_bc_amt` decimal(18,6), `rvnu_unasgnd_us_m_amt` decimal(18,6), `rvnu_ad_fee_us_amt` decimal(18,6), `rvnu_ad_fee_lc_amt` decimal(18,6), `rvnu_ad_fee_bc_amt` decimal(18,6), `rvnu_ad_fee_us_m_amt` decimal(18,6), `rvnu_ad_crd_us_amt` decimal(18,6), `rvnu_ad_crd_lc_amt` decimal(18,6), `rvnu_ad_crd_bc_amt` decimal(18,6), `rvnu_ad_crd_us_m_amt` decimal(18,6), `rvnu_othr_ad_fee_us_amt` decimal(18,6), `rvnu_othr_ad_fee_lc_amt` decimal(18,6), `rvnu_othr_ad_fee_bc_amt` decimal(18,6), `rvnu_othr_ad_fee_us_m_amt` decimal(18,6), `cre_date` date, `cre_user` string, `upd_date` timestamp, `upd_user` string, `cmn_mtrc_summ_dt` date) |USING parquet PARTITIONED BY (`cmn_mtrc_summ_dt`) |""".stripMargin) sql( """ |CREATE TABLE `small_table1` (`CURNCY_ID` DECIMAL(9,0), `CURNCY_PLAN_RATE` DECIMAL(18,6), `CRE_DATE` DATE, `CRE_USER` STRING, `UPD_DATE` TIMESTAMP, `UPD_USER` STRING) |USING parquet |CLUSTERED BY (CURNCY_ID) |SORTED BY (CURNCY_ID) |INTO 1 BUCKETS |""".stripMargin) sql( """ |CREATE TABLE `small_table2` (`cntry_id` DECIMAL(4,0), `curncy_id` DECIMAL(4,0), `cntry_desc` STRING, `cntry_code` STRING, `iso_cntry_code` STRING, `cultural` STRING, `cntry_busn_unit` STRING, `high_vol_cntry_yn_id` TINYINT, `check_sil` TINYINT, `rev_rollup_id` SMALLINT, `rev_rollup` STRING, `prft_cntr_id` INT, `prft_cntr` STRING, `cre_date` DATE, `upd_date` TIMESTAMP, `cre_user` STRING, `upd_user` STRING) |USING parquet |""".stripMargin) sql( """ |CREATE TABLE `small_table3` (`rev_rollup_id` SMALLINT,
[jira] [Commented] (SPARK-28743) YarnShuffleService leads to NodeManager OOM because ChannelOutboundBuffer has too many entries
[ https://issues.apache.org/jira/browse/SPARK-28743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16959858#comment-16959858 ] Anton Ippolitov commented on SPARK-28743: - Hi! We are seeing the exact same issue with Spark 2.4.4. More specifically, the issue arises only for a handful of our Spark jobs and only when we enable transport encryption ({{spark.network.crypto.enabled}} set to {{true)}}. We are able to consistently reproduce the problem: i.e. the NodeManager OOMs every time we launch these particular jobs. When transport encryption is disabled, we don't see this issue anymore. I have tried bumping the NodeManager's memory via {{YARN_NODEMANAGER_HEAPSIZE}} : I set it to 8GB, 16GB and 32GB but the NodeManager OOMs every time. I captured a couple of thread dumps from the NodeManager and they look very similar to the one posted by [~yangjiandan]. (see screenshot) Would anyone have any insight into this issue? I would be happy to provide more information if needed. !Screen Shot 2019-10-25 at 17.24.10.png! > YarnShuffleService leads to NodeManager OOM because ChannelOutboundBuffer has > too many entries > -- > > Key: SPARK-28743 > URL: https://issues.apache.org/jira/browse/SPARK-28743 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 2.3.0 >Reporter: Jiandan Yang >Priority: Major > Attachments: Screen Shot 2019-10-25 at 17.24.10.png, dominator.jpg, > histo.jpg > > > NodeManager heap size is 4G, io.netty.channel.ChannelOutboundBuffer$Entry > occupied about 2.8G by looking at Histogram of Mat, and those Entries were > hold by ChannelOutboundBuffer by looking at dominator_tree of mat. By > analyzing one fo ChannelOutboundBuffer object, I found there were 248867 > entries in the object of ChannelOutboundBuffer > (ChannelOutboundBuffer#flushed=248867), and > ChannelOutboundBuffer#totalPengdingSize=23891232 which is more than > highwaterMark(64K), and unwritable=1 meaning sending buffer was full. But > ChannelHandler seems not check unwritable flag when write message, and > finally NodeManager occurs OOM. > Histogram: > !histo.jpg! > dominator_tree: > !dominator.jpg! > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28743) YarnShuffleService leads to NodeManager OOM because ChannelOutboundBuffer has too many entries
[ https://issues.apache.org/jira/browse/SPARK-28743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Ippolitov updated SPARK-28743: Attachment: Screen Shot 2019-10-25 at 17.24.10.png > YarnShuffleService leads to NodeManager OOM because ChannelOutboundBuffer has > too many entries > -- > > Key: SPARK-28743 > URL: https://issues.apache.org/jira/browse/SPARK-28743 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 2.3.0 >Reporter: Jiandan Yang >Priority: Major > Attachments: Screen Shot 2019-10-25 at 17.24.10.png, dominator.jpg, > histo.jpg > > > NodeManager heap size is 4G, io.netty.channel.ChannelOutboundBuffer$Entry > occupied about 2.8G by looking at Histogram of Mat, and those Entries were > hold by ChannelOutboundBuffer by looking at dominator_tree of mat. By > analyzing one fo ChannelOutboundBuffer object, I found there were 248867 > entries in the object of ChannelOutboundBuffer > (ChannelOutboundBuffer#flushed=248867), and > ChannelOutboundBuffer#totalPengdingSize=23891232 which is more than > highwaterMark(64K), and unwritable=1 meaning sending buffer was full. But > ChannelHandler seems not check unwritable flag when write message, and > finally NodeManager occurs OOM. > Histogram: > !histo.jpg! > dominator_tree: > !dominator.jpg! > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29605) Optimize string to interval casting
Maxim Gekk created SPARK-29605: -- Summary: Optimize string to interval casting Key: SPARK-29605 URL: https://issues.apache.org/jira/browse/SPARK-29605 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk Implement new function stringToInterval in IntervalUtils to cast a value of UTF8String to an instance of CalendarInterval that should be faster than existing implementation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29574) spark with user provided hadoop doesn't work on kubernetes
[ https://issues.apache.org/jira/browse/SPARK-29574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16959849#comment-16959849 ] Michał Wesołowski commented on SPARK-29574: --- I investigated the executor issue. It doesn't handle SPARK_DIST_CLASSPATH environment variable because in kubernetes it is simply {color:#172b4d}org.apache.spark.executor.CoarseGrainedExecutorBackend invoked that does not respect it. For executor to "see" user provided hadoop dependencies I modified entrypoint script so in case of SPARK_K8S_CMD executor it would specify classpath with $SPARK_DIST_CLASSPATH{color} {code:java} ... executor) CMD=( ${JAVA_HOME}/bin/java "${SPARK_EXECUTOR_JAVA_OPTS[@]}" -Xms$SPARK_EXECUTOR_MEMORY -Xmx$SPARK_EXECUTOR_MEMORY -cp "$SPARK_CLASSPATH:$SPARK_DIST_CLASSPATH" org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url $SPARK_DRIVER_URL --executor-id $SPARK_EXECUTOR_ID --cores $SPARK_EXECUTOR_CORES --app-id $SPARK_APPLICATION_ID --hostname $SPARK_EXECUTOR_POD_IP ) {code} So there are two problems: Driver doesn't see environment variables from $SPARK_HOME/conf/spark-env.sh because this gets hidden by mounted config map, and executor doesn't take into account $SPARK_DIST_CLASSPATH at all. > spark with user provided hadoop doesn't work on kubernetes > -- > > Key: SPARK-29574 > URL: https://issues.apache.org/jira/browse/SPARK-29574 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.4.4 >Reporter: Michał Wesołowski >Priority: Major > > When spark-submit is run with image built with "hadoop free" spark and user > provided hadoop it fails on kubernetes (hadoop libraries are not on spark's > classpath). > I downloaded spark [Pre-built with user-provided Apache > Hadoop|https://www-us.apache.org/dist/spark/spark-2.4.4/spark-2.4.4-bin-without-hadoop.tgz]. > > I created docker image with usage of > [docker-image-tool.sh|[https://github.com/apache/spark/blob/master/bin/docker-image-tool.sh]]. > > > Based on this image (2.4.4-without-hadoop) > I created another one with Dockerfile > {code:java} > FROM spark-py:2.4.4-without-hadoop > ENV SPARK_HOME=/opt/spark/ > # This is needed for newer kubernetes versions > ADD > https://repo1.maven.org/maven2/io/fabric8/kubernetes-client/4.6.1/kubernetes-client-4.6.1.jar > $SPARK_HOME/jars > COPY spark-env.sh /opt/spark/conf/spark-env.sh > RUN chmod +x /opt/spark/conf/spark-env.sh > RUN wget -qO- > https://www-eu.apache.org/dist/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz > | tar xz -C /opt/ > ENV HADOOP_HOME=/opt/hadoop-3.2.1 > ENV PATH=${HADOOP_HOME}/bin:${PATH} > {code} > Contents of spark-env.sh: > {code:java} > #!/usr/bin/env bash > export SPARK_DIST_CLASSPATH=$(hadoop > classpath):$HADOOP_HOME/share/hadoop/tools/lib/* > export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native > {code} > spark-submit run with image crated this way fails since spark-env.sh is > overwritten by [volume created when pod > starts|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala#L108] > As quick workaround I tried to modify [entrypoint > script|https://github.com/apache/spark/blob/ea8b5df47476fe66b63bd7f7bcd15acfb80bde78/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh] > to run spark-env.sh during startup and moving spark-env.sh to a different > directory. > Driver starts without issues in this setup however, evethough > SPARK_DIST_CLASSPATH is set executor is constantly failing: > {code:java} > PS > C:\Sandbox\projekty\roboticdrive-analytics\components\docker-images\spark-rda> > kubectl logs rda-script-1571835692837-exec-12 > ++ id -u > + myuid=0 > ++ id -g > + mygid=0 > + set +e > ++ getent passwd 0 > + uidentry=root:x:0:0:root:/root:/bin/ash > + set -e > + '[' -z root:x:0:0:root:/root:/bin/ash ']' > + source /opt/spark-env.sh > +++ hadoop classpath > ++ export > 'SPARK_DIST_CLASSPATH=/opt/hadoop-3.2.1/etc/hadoop:/opt/hadoop-3.2.1/share/hadoop/common/lib/*:/opt/hadoop-3.2.1/share/hadoop/common/*:/opt/hadoop-3.2.1/share/hadoop/hdfs:/opt/hadoop-3.2.1/share/hadoop/hdfs/lib/*:/opt/hadoop-3.2.1/share/hadoop/hdfs/*:/opt/hadoop-3.2.1/share/hadoop/mapreduce/lib/*:/opt/hadoop-3.2.1/share/hadoop/mapreduce/*:/opt/hadoop-3.2.1/share/hadoo++ > > SPARK_DIST_CLASSPATH='/opt/hadoop-3.2.1/etc/hadoop:/opt/hadoop-3.2.1/share/hadoop/common/lib/*:/opt/hadoop-3.2.1/share/hadoop/common/*:/opt/hadoop-3.2.1/share/hadoop/hdfs:/opt/hadoop-3.2.1/share/hadoop/hdfs/lib/*:/opt/hadoop-3.2.1/share/hadoop/hdfs/*:/opt/hadoop-3.2.1/share/hadoop/mapreduce/lib/*:/opt/hadoop-3.2.1/share/hadoop/mapreduce/*:/opt/hadoop-3.2.1/share/hadoop/yar
[jira] [Resolved] (SPARK-29527) SHOW CREATE TABLE should look up catalog/table like v2 commands
[ https://issues.apache.org/jira/browse/SPARK-29527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-29527. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26184 [https://github.com/apache/spark/pull/26184] > SHOW CREATE TABLE should look up catalog/table like v2 commands > --- > > Key: SPARK-29527 > URL: https://issues.apache.org/jira/browse/SPARK-29527 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Fix For: 3.0.0 > > > SHOW CREATE TABLE should look up catalog/table like v2 commands -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29545) Implement bitwise integer aggregates bit_xor
[ https://issues.apache.org/jira/browse/SPARK-29545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-29545. -- Resolution: Fixed Resolved by [https://github.com/apache/spark/pull/26205] > Implement bitwise integer aggregates bit_xor > > > Key: SPARK-29545 > URL: https://issues.apache.org/jira/browse/SPARK-29545 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 3.0.0 > > > {code:java} > {code} > As we support bit_and, bit_or now, we'd better support the related aggregate > function bit_xor ahead of postgreSQL, because many other popular databases > support it. > > [http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.help.sqlanywhere.12.0.1/dbreference/bit-xor-function.html] > [https://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_bit-or] > [https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/Aggregate/BIT_XOR.htm?TocPath=SQL%20Reference%20Manual%7CSQL%20Functions%7CAggregate%20Functions%7C_10] > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-29500) Support partition column when writing to Kafka
[ https://issues.apache.org/jira/browse/SPARK-29500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-29500: Assignee: Nicola Bova > Support partition column when writing to Kafka > -- > > Key: SPARK-29500 > URL: https://issues.apache.org/jira/browse/SPARK-29500 > Project: Spark > Issue Type: Improvement > Components: SQL, Structured Streaming >Affects Versions: 2.4.4, 3.0.0 >Reporter: Nicola Bova >Assignee: Nicola Bova >Priority: Major > Labels: starter > > When writing to a Kafka topic, `KafkaWriter` does not support selecting the > ouput kafka partition through a DataFrame column. > While it is possible to configure a custom Kafka Partitioner with > `.option({color:#6a8759}"kafka.partitioner.class"{color}{color:#cc7832}, > {color}{color:#6a8759}"my.custom.Partitioner"{color})`, this is not enough > for certain use cases. > After the introduction of GDPR, it is a common pattern to emit records with > unique Kafka keys, thus allowing to tombstone individual records. > This strategy implies that the totality of the key information cannot be used > to calculate the topic partition and users need to resort to custom > partitioners. > However, as stated at > [https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#kafka-specific-configurations], > "Keys/Values are always serialized with ByteArraySerializer or > StringSerializer. Use DataFrame operations to explicitly serialize > keys/values into either strings or byte arrays." > Therefore, a custom partitioner would need to > - deserialize the key (or value) > - calculate the output partition using a subset of the key (or value) fields > This is inefficient because it requires an unnecessary deserialization step. > It also makes it impossible to use Spark batch writer to send Kafka > tombstones when the partition is calculated from a subset of the kafka value. > It would be a nice addition to let the user choose a partition by setting a > value in the "partition" column of the dataframe, as already done for > `topic`, `key`, `value`, and `headers` in `KafkaWriter`, also mirroring the > `ProducerRecord` API. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29500) Support partition column when writing to Kafka
[ https://issues.apache.org/jira/browse/SPARK-29500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-29500. -- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26153 [https://github.com/apache/spark/pull/26153] > Support partition column when writing to Kafka > -- > > Key: SPARK-29500 > URL: https://issues.apache.org/jira/browse/SPARK-29500 > Project: Spark > Issue Type: Improvement > Components: SQL, Structured Streaming >Affects Versions: 2.4.4, 3.0.0 >Reporter: Nicola Bova >Assignee: Nicola Bova >Priority: Major > Labels: starter > Fix For: 3.0.0 > > > When writing to a Kafka topic, `KafkaWriter` does not support selecting the > ouput kafka partition through a DataFrame column. > While it is possible to configure a custom Kafka Partitioner with > `.option({color:#6a8759}"kafka.partitioner.class"{color}{color:#cc7832}, > {color}{color:#6a8759}"my.custom.Partitioner"{color})`, this is not enough > for certain use cases. > After the introduction of GDPR, it is a common pattern to emit records with > unique Kafka keys, thus allowing to tombstone individual records. > This strategy implies that the totality of the key information cannot be used > to calculate the topic partition and users need to resort to custom > partitioners. > However, as stated at > [https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#kafka-specific-configurations], > "Keys/Values are always serialized with ByteArraySerializer or > StringSerializer. Use DataFrame operations to explicitly serialize > keys/values into either strings or byte arrays." > Therefore, a custom partitioner would need to > - deserialize the key (or value) > - calculate the output partition using a subset of the key (or value) fields > This is inefficient because it requires an unnecessary deserialization step. > It also makes it impossible to use Spark batch writer to send Kafka > tombstones when the partition is calculated from a subset of the kafka value. > It would be a nice addition to let the user choose a partition by setting a > value in the "partition" column of the dataframe, as already done for > `topic`, `key`, `value`, and `headers` in `KafkaWriter`, also mirroring the > `ProducerRecord` API. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29604) SessionState is initialized with isolated classloader for Hive if spark.sql.hive.metastore.jars is being set
[ https://issues.apache.org/jira/browse/SPARK-29604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16959711#comment-16959711 ] Jungtaek Lim commented on SPARK-29604: -- I've figured out the root cause and have a patch. Will submit a patch soon. I may need some more time to craft a relevant test. > SessionState is initialized with isolated classloader for Hive if > spark.sql.hive.metastore.jars is being set > > > Key: SPARK-29604 > URL: https://issues.apache.org/jira/browse/SPARK-29604 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4, 3.0.0 >Reporter: Jungtaek Lim >Priority: Major > > I've observed the issue that external listeners cannot be loaded properly > when we run spark-sql with "spark.sql.hive.metastore.jars" configuration > being used. > {noformat} > Exception in thread "main" java.lang.IllegalArgumentException: Error while > instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder': > at > org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1102) > at > org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:154) > at > org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:153) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:153) > at > org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:150) > at > org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$2.apply(SparkSession.scala:104) > at > org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$2.apply(SparkSession.scala:104) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:104) > at > org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:103) > at org.apache.spark.sql.internal.SQLConf$.get(SQLConf.scala:149) > at > org.apache.spark.sql.hive.client.HiveClientImpl.org$apache$spark$sql$hive$client$HiveClientImpl$$client(HiveClientImpl.scala:282) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:306) > at > org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:247) > at > org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:246) > at > org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:296) > at > org.apache.spark.sql.hive.client.HiveClientImpl.databaseExists(HiveClientImpl.scala:386) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:215) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:215) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:215) > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97) > at > org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:214) > at > org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:114) > at > org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:102) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:53) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:315) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:166) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:847) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.d
[jira] [Created] (SPARK-29604) SessionState is initialized with isolated classloader for Hive if spark.sql.hive.metastore.jars is being set
Jungtaek Lim created SPARK-29604: Summary: SessionState is initialized with isolated classloader for Hive if spark.sql.hive.metastore.jars is being set Key: SPARK-29604 URL: https://issues.apache.org/jira/browse/SPARK-29604 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.4, 3.0.0 Reporter: Jungtaek Lim I've observed the issue that external listeners cannot be loaded properly when we run spark-sql with "spark.sql.hive.metastore.jars" configuration being used. {noformat} Exception in thread "main" java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder': at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1102) at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:154) at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:153) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:153) at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:150) at org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$2.apply(SparkSession.scala:104) at org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$2.apply(SparkSession.scala:104) at scala.Option.map(Option.scala:146) at org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:104) at org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:103) at org.apache.spark.sql.internal.SQLConf$.get(SQLConf.scala:149) at org.apache.spark.sql.hive.client.HiveClientImpl.org$apache$spark$sql$hive$client$HiveClientImpl$$client(HiveClientImpl.scala:282) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:306) at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:247) at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:246) at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:296) at org.apache.spark.sql.hive.client.HiveClientImpl.databaseExists(HiveClientImpl.scala:386) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:215) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:215) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:215) at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97) at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:214) at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:114) at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:102) at org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:53) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:315) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:166) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:847) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:922) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:931) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: org.apache.spark.SparkException: Exception when registering StreamingQueryListener at org.apache.spark.sql.streaming.StreamingQueryManager.(StreamingQueryManager.scala:70) at org.apache.spark.sql.internal.BaseSessionStateBuilder.streamingQueryManager(BaseSessionStateBuilder.scala:260) at org.apache.spar
[jira] [Created] (SPARK-29603) Support application priority for spark on yarn
Kent Yao created SPARK-29603: Summary: Support application priority for spark on yarn Key: SPARK-29603 URL: https://issues.apache.org/jira/browse/SPARK-29603 Project: Spark Issue Type: Improvement Components: YARN Affects Versions: 3.0.0 Reporter: Kent Yao We can set priority to an application for YARN to define pending applications ordering policy, those with higher priority have a better opportunity to be activated. YARN CapacityScheduler only. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-9612) Add instance weight support for GBTs
[ https://issues.apache.org/jira/browse/SPARK-9612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reopened SPARK-9612: - Assignee: zhengruifeng (was: DB Tsai) > Add instance weight support for GBTs > > > Key: SPARK-9612 > URL: https://issues.apache.org/jira/browse/SPARK-9612 > Project: Spark > Issue Type: New Feature > Components: ML >Reporter: Joseph K. Bradley >Assignee: zhengruifeng >Priority: Minor > Labels: bulk-closed > > GBT support for instance weights could be handled by: > * sampling data before passing it to trees > * passing weights to trees (requiring weight support for trees first, but > probably better in the end) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-9612) Add instance weight support for GBTs
[ https://issues.apache.org/jira/browse/SPARK-9612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-9612. - Resolution: Fixed > Add instance weight support for GBTs > > > Key: SPARK-9612 > URL: https://issues.apache.org/jira/browse/SPARK-9612 > Project: Spark > Issue Type: New Feature > Components: ML >Reporter: Joseph K. Bradley >Assignee: zhengruifeng >Priority: Minor > Labels: bulk-closed > > GBT support for instance weights could be handled by: > * sampling data before passing it to trees > * passing weights to trees (requiring weight support for trees first, but > probably better in the end) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29599) Support pagination for session table in JDBC/ODBC Tab
[ https://issues.apache.org/jira/browse/SPARK-29599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-29599: -- Summary: Support pagination for session table in JDBC/ODBC Tab (was: Support pagination for session table in JDBC/ODBC Session page ) > Support pagination for session table in JDBC/ODBC Tab > -- > > Key: SPARK-29599 > URL: https://issues.apache.org/jira/browse/SPARK-29599 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 2.4.0, 3.0.0 >Reporter: angerszhu >Priority: Minor > > Support pagination for session table in JDBC/ODBC Session page -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29602) How does the spark from_json json and dataframe transform ignore the case of the json key
ruiliang created SPARK-29602: Summary: How does the spark from_json json and dataframe transform ignore the case of the json key Key: SPARK-29602 URL: https://issues.apache.org/jira/browse/SPARK-29602 Project: Spark Issue Type: Question Components: Spark Core Affects Versions: 2.4.4 Reporter: ruiliang How does the spark from_json json and dataframe transform ignore the case of the json key code {code:java} def main(args: Array[String]): Unit = { val spark = SparkSession.builder().master("local[*]"). enableHiveSupport().getOrCreate() //spark.sqlContext.setConf("spark.sql.caseSensitive", "false") import spark.implicits._ //hive table data Lower case automatically when saving val hivetable = """{"deliverysystype":"dms","orderid":"B0001-N103-000-005882-RL3AI2RWCP","storeid":103,"timestamp":1571587522000,"":"dms"}""" val hiveDF = Seq(hivetable).toDF("msg") val rdd = hiveDF.rdd.map(_.getString(0)) val jsonDataDF = spark.read.json(rdd.toDS()) jsonDataDF.show(false) //++---++---+-+ //||deliverysystype|orderid |storeid|timestamp | //++---++---+-+ //|dms |dms|B0001-N103-000-005882-RL3AI2RWCP|103 |1571587522000| //++---++---+-+ val jsonstr = """{"data":{"deliverySysType":"dms","orderId":"B0001-N103-000-005882-RL3AI2RWCP","storeId":103,"timestamp":1571587522000},"accessKey":"f9d069861dfb1678","actionName":"candao.rider.getDeliveryInfo","sign":"fa0239c75e065cf43d0a4040665578ba" }""" val jsonStrDF = Seq(jsonstr).toDF("msg") //转换json数据列 action_nameactionName jsonStrDF.show(false) val structSeqSchme = StructType(Seq(StructField("data", jsonDataDF.schema, true), StructField("accessKey", StringType, true), //这里应该 accessKey StructField("actionName", StringType, true), StructField("columnNameOfCorruptRecord", StringType, true) )) //hive col name lower case, json data key capital and small letter,Take less than value val mapOption = Map("allowBackslashEscapingAnyCharacter" -> "true", "allowUnquotedControlChars" -> "true", "allowSingleQuotes" -> "true") //I'm not doing anything here, but I don't know how to set a value, right? val newDF = jsonStrDF.withColumn("data_col", from_json(col("msg"), structSeqSchme, mapOption)) newDF.show(false) newDF.printSchema() newDF.select($"data_col.accessKey", $"data_col.actionName", $"data_col.data.*", $"data_col.columnNameOfCorruptRecord").show(false) //Lowercase columns do not fetch data. How do you make it ignore lowercase columns? deliverysystype,storeid-> null //++++---+---+---+-+-+ //|accessKey |actionName ||deliverysystype|orderid|storeid|timestamp|columnNameOfCorruptRecord| //++++---+---+---+-+-+ //|f9d069861dfb1678|candao.rider.getDeliveryInfo|null|null |null |null |1571587522000|null | //++++---+---+---+-+-+ } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29580) KafkaDelegationTokenSuite fails to create new KafkaAdminClient
[ https://issues.apache.org/jira/browse/SPARK-29580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16959512#comment-16959512 ] Gabor Somogyi commented on SPARK-29580: --- I think `It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector` is because the issue happens in `beforeAll`. I've tried to reproduce it but no luck until now. Additionally I've taken a look at the jenkins logs but doesn't contain why this happened. I think we should add further debug log information in a PR and then * trying to reproduce it further * wait on jenkins and take a look at the logs when happens again > KafkaDelegationTokenSuite fails to create new KafkaAdminClient > -- > > Key: SPARK-29580 > URL: https://issues.apache.org/jira/browse/SPARK-29580 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > - > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112562/testReport/org.apache.spark.sql.kafka010/KafkaDelegationTokenSuite/_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_/ > {code} > sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: Failed to > create new KafkaAdminClient > at > org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:407) > at > org.apache.kafka.clients.admin.AdminClient.create(AdminClient.java:55) > at > org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:227) > at > org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:249) > at > org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49) > at > org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212) > at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) > at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) > at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:56) > at > org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317) > at > org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510) > at sbt.ForkMain$Run$2.call(ForkMain.java:296) > at sbt.ForkMain$Run$2.call(ForkMain.java:286) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: > javax.security.auth.login.LoginException: Server not found in Kerberos > database (7) - Server not found in Kerberos database > at > org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:160) > at > org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:146) > at > org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:67) > at > org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:99) > at > org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:382) > ... 16 more > Caused by: sbt.ForkMain$ForkError: javax.security.auth.login.LoginException: > Server not found in Kerberos database (7) - Server not found in Kerberos > database > at > com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:804) > at > com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) > at > javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) > at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) > at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) > at java.security.AccessController.doPrivileged(Native Method) > at > javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680) > at javax.security.auth.login.LoginContext.login(LoginContext.java:587) > at > org.apache.kafka.common.security.authenticator.AbstractLogin.login(AbstractLogin.java:60) > at > org.apache.kafka.common.security.kerberos.KerberosLogin.logi
[jira] [Resolved] (SPARK-29461) Spark dataframe writer does not expose metrics for JDBC writer
[ https://issues.apache.org/jira/browse/SPARK-29461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-29461. -- Fix Version/s: 3.0.0 Assignee: Jungtaek Lim Resolution: Fixed Resolved by [https://github.com/apache/spark/pull/26109] > Spark dataframe writer does not expose metrics for JDBC writer > --- > > Key: SPARK-29461 > URL: https://issues.apache.org/jira/browse/SPARK-29461 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: ROHIT KALHANS >Assignee: Jungtaek Lim >Priority: Minor > Fix For: 3.0.0 > > > Spark does not expose the writer metrics when using the Dataframe JDBC > writer. Similar instances of such bugs have been fixed in previous versions. > However, it seems the fix was not exhaustive since it does not cover all the > writers. > Similar bugs: > https://issues.apache.org/jira/browse/SPARK-21882 > https://issues.apache.org/jira/browse/SPARK-22605 > > > Console reporter output > app-name.1.executor.bytesWritten > count = 0 > > app-name.1.executor.recordsWritten > count = 0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org