[jira] [Updated] (SPARK-24352) Flaky test: StandaloneDynamicAllocationSuite
[ https://issues.apache.org/jira/browse/SPARK-24352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-24352: Issue Type: Test (was: Bug) > Flaky test: StandaloneDynamicAllocationSuite > > > Key: SPARK-24352 > URL: https://issues.apache.org/jira/browse/SPARK-24352 > Project: Spark > Issue Type: Test > Components: Spark Core, Tests >Affects Versions: 2.3.0 >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin >Priority: Major > Fix For: 2.3.4, 2.4.4, 3.0.0 > > > From jenkins: > [https://amplab.cs.berkeley.edu/jenkins/user/vanzin/my-views/view/Spark/job/spark-branch-2.3-test-maven-hadoop-2.6/384/testReport/junit/org.apache.spark.deploy/StandaloneDynamicAllocationSuite/executor_registration_on_a_blacklisted_host_must_fail/] > > {noformat} > Error Message > There is already an RpcEndpoint called CoarseGrainedScheduler > Stacktrace > java.lang.IllegalArgumentException: There is already an RpcEndpoint > called CoarseGrainedScheduler > at > org.apache.spark.rpc.netty.Dispatcher.registerRpcEndpoint(Dispatcher.scala:71) > at > org.apache.spark.rpc.netty.NettyRpcEnv.setupEndpoint(NettyRpcEnv.scala:130) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.createDriverEndpointRef(CoarseGrainedSchedulerBackend.scala:396) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.start(CoarseGrainedSchedulerBackend.scala:391) > at > org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend.start(StandaloneSchedulerBackend.scala:61) > at > org.apache.spark.deploy.StandaloneDynamicAllocationSuite$$anonfun$1.apply$mcV$sp(StandaloneDynamicAllocationSuite.scala:512) > at > org.apache.spark.deploy.StandaloneDynamicAllocationSuite$$anonfun$1.apply(StandaloneDynamicAllocationSuite.scala:495) > at > org.apache.spark.deploy.StandaloneDynamicAllocationSuite$$anonfun$1.apply(StandaloneDynamicAllocationSuite.scala:495) > at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) > at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:68) > at > org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:183) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196) > {noformat} > This actually looks like a previous test is leaving some stuff running and > making this one fail. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28535) Flaky test: JobCancellationSuite."interruptible iterator of shuffle reader"
[ https://issues.apache.org/jira/browse/SPARK-28535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-28535: Issue Type: Test (was: Bug) > Flaky test: JobCancellationSuite."interruptible iterator of shuffle reader" > --- > > Key: SPARK-28535 > URL: https://issues.apache.org/jira/browse/SPARK-28535 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 2.3.3, 3.0.0, 2.4.3 >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin >Priority: Minor > Fix For: 2.3.4, 2.4.4, 3.0.0 > > > This is the same flakiness as in SPARK-23881, except the fix there didn't > really take, at least on our build machines. > {noformat} > org.scalatest.exceptions.TestFailedException: 1 was not less than 1 > at > org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:528) > at > org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560) > at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:501) > {noformat} > Since that bug is short on explanations, the issue is that there's a race > between the thread posting the "stage completed" event to the listener which > unblocks the test, and the thread killing the task in the executor. If the > even arrives first, it will unblock task execution, and there's a chance that > all elements will actually be processed before the executor has a chance to > stop the task. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28418) Flaky Test: pyspark.sql.tests.test_dataframe: test_query_execution_listener_on_collect
[ https://issues.apache.org/jira/browse/SPARK-28418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-28418: Issue Type: Test (was: Bug) > Flaky Test: pyspark.sql.tests.test_dataframe: > test_query_execution_listener_on_collect > -- > > Key: SPARK-28418 > URL: https://issues.apache.org/jira/browse/SPARK-28418 > Project: Spark > Issue Type: Test > Components: PySpark, SQL >Affects Versions: 3.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Fix For: 2.3.4, 2.4.4, 3.0.0 > > > {code} > ERROR [0.164s]: test_query_execution_listener_on_collect > (pyspark.sql.tests.test_dataframe.QueryExecutionListenerTests) > -- > Traceback (most recent call last): > File "/home/jenkins/python/pyspark/sql/tests/test_dataframe.py", line 758, > in test_query_execution_listener_on_collect > "The callback from the query execution listener should be called after > 'collect'") > AssertionError: The callback from the query execution listener should be > called after 'collect' > {code} > Seems it can be failed due to not waiting events to be proceeded. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28335) Flaky test: org.apache.spark.streaming.kafka010.DirectKafkaStreamSuite.offset recovery from kafka
[ https://issues.apache.org/jira/browse/SPARK-28335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-28335: Issue Type: Test (was: Bug) > Flaky test: org.apache.spark.streaming.kafka010.DirectKafkaStreamSuite.offset > recovery from kafka > - > > Key: SPARK-28335 > URL: https://issues.apache.org/jira/browse/SPARK-28335 > Project: Spark > Issue Type: Test > Components: DStreams, Tests >Affects Versions: 2.1.3, 2.2.3, 2.3.3, 3.0.0, 2.4.3 >Reporter: Gabor Somogyi >Assignee: Gabor Somogyi >Priority: Minor > Fix For: 2.3.4, 2.4.4, 3.0.0 > > Attachments: bad.log > > > {code:java} > org.scalatest.exceptions.TestFailedException: {} was empty > at > org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:528) > at > org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560) > at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:501) > at > org.apache.spark.streaming.kafka010.DirectKafkaStreamSuite$$anonfun$6.apply$mcV$sp(DirectKafkaStreamSuite.scala:466) > at > org.apache.spark.streaming.kafka010.DirectKafkaStreamSuite$$anonfun$6.apply(DirectKafkaStreamSuite.scala:416) > at > org.apache.spark.streaming.kafka010.DirectKafkaStreamSuite$$anonfun$6.apply(DirectKafkaStreamSuite.scala:416) > at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) > at or > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28357) Fix Flaky Test - FileAppenderSuite.rolling file appender - size-based rolling compressed
[ https://issues.apache.org/jira/browse/SPARK-28357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-28357: Issue Type: Test (was: Bug) > Fix Flaky Test - FileAppenderSuite.rolling file appender - size-based rolling > compressed > > > Key: SPARK-28357 > URL: https://issues.apache.org/jira/browse/SPARK-28357 > Project: Spark > Issue Type: Test > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 2.3.4, 2.4.4, 3.0.0 > > > - > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107553/testReport/org.apache.spark.util/FileAppenderSuite/rolling_file_appender___size_based_rolling__compressed_/ -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24898) Adding spark.checkpoint.compress to the docs
[ https://issues.apache.org/jira/browse/SPARK-24898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-24898: Issue Type: Improvement (was: Task) > Adding spark.checkpoint.compress to the docs > > > Key: SPARK-24898 > URL: https://issues.apache.org/jira/browse/SPARK-24898 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 2.2.0 >Reporter: Riccardo Corbella >Assignee: Sandeep >Priority: Trivial > Fix For: 2.3.4, 2.4.4, 3.0.0 > > > Parameter *spark.checkpoint.compress* is not listed under configuration > properties. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28261) Flaky test: org.apache.spark.network.TransportClientFactorySuite.reuseClientsUpToConfigVariable
[ https://issues.apache.org/jira/browse/SPARK-28261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-28261: Issue Type: Test (was: Bug) > Flaky test: > org.apache.spark.network.TransportClientFactorySuite.reuseClientsUpToConfigVariable > --- > > Key: SPARK-28261 > URL: https://issues.apache.org/jira/browse/SPARK-28261 > Project: Spark > Issue Type: Test > Components: Spark Core, Tests >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.3, 3.0.0, 2.4.3 >Reporter: Gabor Somogyi >Assignee: Gabor Somogyi >Priority: Minor > Fix For: 2.3.4, 2.4.4, 3.0.0 > > > Error message: > {noformat} > java.lang.AssertionError: expected:<3> but was:<4> > ...{noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28247) Flaky test: "query without test harness" in ContinuousSuite
[ https://issues.apache.org/jira/browse/SPARK-28247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-28247: Issue Type: Test (was: Bug) > Flaky test: "query without test harness" in ContinuousSuite > --- > > Key: SPARK-28247 > URL: https://issues.apache.org/jira/browse/SPARK-28247 > Project: Spark > Issue Type: Test > Components: Structured Streaming, Tests >Affects Versions: 3.0.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > Fix For: 2.4.4, 3.0.0 > > > This test has failed a few times in some PRs, as well as easy to reproduce > locally. Example of a failure: > {noformat} > [info] - query without test harness *** FAILED *** (2 seconds, 931 > milliseconds) > [info] scala.Predef.Set.apply[Int](0, 1, 2, > 3).map[org.apache.spark.sql.Row, > scala.collection.immutable.Set[org.apache.spark.sql.Row]](((x$3: Int) => > org.apache.spark.sql.Row.apply(x$3)))(immutable.this.Set.canBuildFrom[org.apache.spark.sql.Row]).subsetOf(scala.Predef.refArrayOps[org.apache.spark.sql.Row](results).toSet[org.apache.spark.sql.Row]) > was false > (ContinuousSuite.scala:226){noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28713) Bump checkstyle from 8.14 to 8.23
[ https://issues.apache.org/jira/browse/SPARK-28713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-28713: Issue Type: Improvement (was: Task) > Bump checkstyle from 8.14 to 8.23 > - > > Key: SPARK-28713 > URL: https://issues.apache.org/jira/browse/SPARK-28713 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.3 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Fix For: 2.4.4, 3.0.0 > > > From the GitHub Security Advisory Database: > Moderate severity vulnerability that affects com.puppycrawl.tools:checkstyle > Checkstyle prior to 8.18 loads external DTDs by default, which can > potentially lead to denial of service attacks or the leaking of confidential > information. > Affected versions: < 8.18 -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27596) The JDBC 'query' option doesn't work for Oracle database
[ https://issues.apache.org/jira/browse/SPARK-27596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-27596: Issue Type: Bug (was: Improvement) > The JDBC 'query' option doesn't work for Oracle database > > > Key: SPARK-27596 > URL: https://issues.apache.org/jira/browse/SPARK-27596 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.2 >Reporter: Xiao Li >Assignee: Dilip Biswal >Priority: Major > Fix For: 2.4.4, 3.0.0 > > > For the JDBC option `query`, we use the identifier name to start with > underscore: s"(${subquery}) > __SPARK_GEN_JDBC_SUBQUERY_NAME_${curId.getAndIncrement()}". This is not > supported by Oracle. > The Oracle doesn't seem to support identifier name to start with non-alphabet > character (unless it is quoted) and has length restrictions as well. > https://docs.oracle.com/cd/B19306_01/server.102/b14200/sql_elements008.htm > {code:java} > Nonquoted identifiers must begin with an alphabetic character from your > database character set. Quoted identifiers can begin with any character as > per below documentation - > Nonquoted identifiers can contain only alphanumeric characters from your > database character set and the underscore (_), dollar sign ($), and pound > sign (#). Database links can also contain periods (.) and "at" signs (@). > Oracle strongly discourages you from using $ and # in nonquoted identifiers. > {code} > The alias name '_SPARK_GEN_JDBC_SUBQUERY_NAME' should be fixed to > remove "__" prefix ( or make it quoted.not sure if it may impact other > sources) to make it work for Oracle. Also the length should be limited as it > is hitting below error on removing the prefix. > {code:java} > java.sql.SQLSyntaxErrorException: ORA-00972: identifier is too long > {code} > It can be verified using below sqlfiddle link. > http://www.sqlfiddle.com/#!4/9bbe9a/10050 -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28642) Hide credentials in show create table
[ https://issues.apache.org/jira/browse/SPARK-28642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-28642: Issue Type: Bug (was: Improvement) > Hide credentials in show create table > - > > Key: SPARK-28642 > URL: https://issues.apache.org/jira/browse/SPARK-28642 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 2.4.4, 3.0.0 > > > {code:sql} > spark-sql> show create table mysql_federated_sample; > CREATE TABLE `mysql_federated_sample` (`TBL_ID` BIGINT, `CREATE_TIME` INT, > `DB_ID` BIGINT, `LAST_ACCESS_TIME` INT, `OWNER` STRING, `RETENTION` INT, > `SD_ID` BIGINT, `TBL_NAME` STRING, `TBL_TYPE` STRING, `VIEW_EXPANDED_TEXT` > STRING, `VIEW_ORIGINAL_TEXT` STRING, `IS_REWRITE_ENABLED` BOOLEAN) > USING org.apache.spark.sql.jdbc > OPTIONS ( > `url` 'jdbc:mysql://localhost/hive?user=root&password=mypasswd', > `driver` 'com.mysql.jdbc.Driver', > `dbtable` 'TBLS' > ) > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23519) Create View Commands Fails with The view output (col1,col1) contains duplicate column name
[ https://issues.apache.org/jira/browse/SPARK-23519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-23519: Component/s: (was: Spark Core) > Create View Commands Fails with The view output (col1,col1) contains > duplicate column name > --- > > Key: SPARK-23519 > URL: https://issues.apache.org/jira/browse/SPARK-23519 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.1 >Reporter: Franck Tago >Assignee: hemanth meka >Priority: Major > Fix For: 3.0.0 > > Attachments: image-2018-05-10-10-48-57-259.png > > > 1- create and populate a hive table . I did this in a hive cli session .[ > not that this matters ] > create table atable (col1 int) ; > insert into atable values (10 ) , (100) ; > 2. create a view from the table. > [These actions were performed from a spark shell ] > spark.sql("create view default.aview (int1 , int2 ) as select col1 , col1 > from atable ") > java.lang.AssertionError: assertion failed: The view output (col1,col1) > contains duplicate column name. > at scala.Predef$.assert(Predef.scala:170) > at > org.apache.spark.sql.execution.command.ViewHelper$.generateViewProperties(views.scala:361) > at > org.apache.spark.sql.execution.command.CreateViewCommand.prepareTable(views.scala:236) > at > org.apache.spark.sql.execution.command.CreateViewCommand.run(views.scala:174) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67) > at org.apache.spark.sql.Dataset.(Dataset.scala:183) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:68) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:632) -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23519) Create View Commands Fails with The view output (col1,col1) contains duplicate column name
[ https://issues.apache.org/jira/browse/SPARK-23519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920302#comment-16920302 ] Liang-Chi Hsieh commented on SPARK-23519: - This was closed and then reopened and fixed. The label [bulk-closed|https://issues.apache.org/jira/issues/?jql=labels+%3D+bulk-closed] looks not correct. I remove it. Feel free to add it back if I misunderstand it. > Create View Commands Fails with The view output (col1,col1) contains > duplicate column name > --- > > Key: SPARK-23519 > URL: https://issues.apache.org/jira/browse/SPARK-23519 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 2.2.1 >Reporter: Franck Tago >Assignee: hemanth meka >Priority: Major > Labels: bulk-closed > Fix For: 3.0.0 > > Attachments: image-2018-05-10-10-48-57-259.png > > > 1- create and populate a hive table . I did this in a hive cli session .[ > not that this matters ] > create table atable (col1 int) ; > insert into atable values (10 ) , (100) ; > 2. create a view from the table. > [These actions were performed from a spark shell ] > spark.sql("create view default.aview (int1 , int2 ) as select col1 , col1 > from atable ") > java.lang.AssertionError: assertion failed: The view output (col1,col1) > contains duplicate column name. > at scala.Predef$.assert(Predef.scala:170) > at > org.apache.spark.sql.execution.command.ViewHelper$.generateViewProperties(views.scala:361) > at > org.apache.spark.sql.execution.command.CreateViewCommand.prepareTable(views.scala:236) > at > org.apache.spark.sql.execution.command.CreateViewCommand.run(views.scala:174) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67) > at org.apache.spark.sql.Dataset.(Dataset.scala:183) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:68) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:632) -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23519) Create View Commands Fails with The view output (col1,col1) contains duplicate column name
[ https://issues.apache.org/jira/browse/SPARK-23519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-23519: Labels: (was: bulk-closed) > Create View Commands Fails with The view output (col1,col1) contains > duplicate column name > --- > > Key: SPARK-23519 > URL: https://issues.apache.org/jira/browse/SPARK-23519 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 2.2.1 >Reporter: Franck Tago >Assignee: hemanth meka >Priority: Major > Fix For: 3.0.0 > > Attachments: image-2018-05-10-10-48-57-259.png > > > 1- create and populate a hive table . I did this in a hive cli session .[ > not that this matters ] > create table atable (col1 int) ; > insert into atable values (10 ) , (100) ; > 2. create a view from the table. > [These actions were performed from a spark shell ] > spark.sql("create view default.aview (int1 , int2 ) as select col1 , col1 > from atable ") > java.lang.AssertionError: assertion failed: The view output (col1,col1) > contains duplicate column name. > at scala.Predef$.assert(Predef.scala:170) > at > org.apache.spark.sql.execution.command.ViewHelper$.generateViewProperties(views.scala:361) > at > org.apache.spark.sql.execution.command.CreateViewCommand.prepareTable(views.scala:236) > at > org.apache.spark.sql.execution.command.CreateViewCommand.run(views.scala:174) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67) > at org.apache.spark.sql.Dataset.(Dataset.scala:183) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:68) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:632) -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28612) DataSourceV2: Add new DataFrameWriter API for v2
[ https://issues.apache.org/jira/browse/SPARK-28612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Burak Yavuz resolved SPARK-28612. - Fix Version/s: 3.0.0 Resolution: Fixed Resolved by [https://github.com/apache/spark/pull/25354] > DataSourceV2: Add new DataFrameWriter API for v2 > > > Key: SPARK-28612 > URL: https://issues.apache.org/jira/browse/SPARK-28612 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ryan Blue >Assignee: Ryan Blue >Priority: Major > Fix For: 3.0.0 > > > This tracks adding an API like the one proposed in SPARK-23521: > {code:lang=scala} > df.writeTo("catalog.db.table").append() // AppendData > df.writeTo("catalog.db.table").overwriteDynamic() // > OverwritePartiitonsDynamic > df.writeTo("catalog.db.table").overwrite($"date" === '2019-01-01') // > OverwriteByExpression > df.writeTo("catalog.db.table").partitionBy($"type", $"date").create() // CTAS > df.writeTo("catalog.db.table").replace() // RTAS > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28612) DataSourceV2: Add new DataFrameWriter API for v2
[ https://issues.apache.org/jira/browse/SPARK-28612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Burak Yavuz reassigned SPARK-28612: --- Assignee: Ryan Blue > DataSourceV2: Add new DataFrameWriter API for v2 > > > Key: SPARK-28612 > URL: https://issues.apache.org/jira/browse/SPARK-28612 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ryan Blue >Assignee: Ryan Blue >Priority: Major > > This tracks adding an API like the one proposed in SPARK-23521: > {code:lang=scala} > df.writeTo("catalog.db.table").append() // AppendData > df.writeTo("catalog.db.table").overwriteDynamic() // > OverwritePartiitonsDynamic > df.writeTo("catalog.db.table").overwrite($"date" === '2019-01-01') // > OverwriteByExpression > df.writeTo("catalog.db.table").partitionBy($"type", $"date").create() // CTAS > df.writeTo("catalog.db.table").replace() // RTAS > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28770) Flaky Tests: Test ReplayListenerSuite.End-to-end replay with compression failed
[ https://issues.apache.org/jira/browse/SPARK-28770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920234#comment-16920234 ] Wing Yew Poon commented on SPARK-28770: --- I looked into the issue further. In EventLoggingListener, almost all calls to logEvent (to write serialized JSON to the event log) are as a direct result of an onXXX method being called. The exception is that within onStageCompleted, before calling logEvent with the SparkListenerStageCompleted event, if we are logging stage executor metrics, there is a bulk call to logEvent with SparkListenerStageExecutorMetrics events via a Map.foreach. This Map.foreach bulk operation may not log the events with the same order. This is also the only place where SparkListenerStageExecutorMetrics events get logged. For this reason, I think the affected tests ("End-to-end replay" and "End-to-end replay with compression", both implemented by calling testApplicationReplay) should not compare the SparkListenerStageExecutorMetrics events. That should eliminate the indeterminacy of the tests. > Flaky Tests: Test ReplayListenerSuite.End-to-end replay with compression > failed > --- > > Key: SPARK-28770 > URL: https://issues.apache.org/jira/browse/SPARK-28770 > Project: Spark > Issue Type: Test > Components: Spark Core >Affects Versions: 2.4.3 > Environment: Community jenkins and our arm testing instance. >Reporter: huangtianhua >Priority: Major > > Test > org.apache.spark.scheduler.ReplayListenerSuite.End-to-end replay with > compression is failed see > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-3.2/267/testReport/junit/org.apache.spark.scheduler/ReplayListenerSuite/End_to_end_replay_with_compression/] > > And also the test is failed on arm instance, I sent email to spark-dev > before, and we suspect there is something related with the commit > [https://github.com/apache/spark/pull/23767], we tried to revert it and the > tests are passed: > ReplayListenerSuite: > - ... > - End-to-end replay *** FAILED *** > "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622) > - End-to-end replay with compression *** FAILED *** > "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622) > > Not sure what's wrong, hope someone can help to figure it out, thanks very > much. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28921) Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10)
[ https://issues.apache.org/jira/browse/SPARK-28921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920227#comment-16920227 ] Dongjoon Hyun commented on SPARK-28921: --- BTW, [~andygrove]. I tried to add your PR to this, but it seems to be already there, doesn't it? 25640? > Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10) > - > > Key: SPARK-28921 > URL: https://issues.apache.org/jira/browse/SPARK-28921 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.3.3, 2.4.3 >Reporter: Paul Schweigert >Priority: Major > > Spark jobs are failing on latest versions of Kubernetes when jobs attempt to > provision executor pods (jobs like Spark-Pi that do not launch executors run > without a problem): > > Here's an example error message: > > {code:java} > 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors > from Kubernetes. > 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors > from Kubernetes.19/08/30 01:29:09 WARN WatchConnectionManager: Exec Failure: > HTTP 403, Status: 403 - > java.net.ProtocolException: Expected HTTP 101 response but was '403 > Forbidden' > at > okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:216) > at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:183) > at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141) > at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > > at java.lang.Thread.run(Thread.java:748) > {code} > > Looks like the issue is caused by fixes for a recent CVE : > CVE: [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-14809] > Fix: [https://github.com/fabric8io/kubernetes-client/pull/1669] > > Looks like upgrading kubernetes-client to 4.4.2 would solve this issue. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28921) Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10)
[ https://issues.apache.org/jira/browse/SPARK-28921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920226#comment-16920226 ] Dongjoon Hyun commented on SPARK-28921: --- [~psschwei] and [~andygrove]. BTW, do you know how many clusters are exposed those versions in the productions? Maybe, at least, in EKS/AKS/GKE since they are popular managed services. > Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10) > - > > Key: SPARK-28921 > URL: https://issues.apache.org/jira/browse/SPARK-28921 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.3.3, 2.4.3 >Reporter: Paul Schweigert >Priority: Major > > Spark jobs are failing on latest versions of Kubernetes when jobs attempt to > provision executor pods (jobs like Spark-Pi that do not launch executors run > without a problem): > > Here's an example error message: > > {code:java} > 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors > from Kubernetes. > 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors > from Kubernetes.19/08/30 01:29:09 WARN WatchConnectionManager: Exec Failure: > HTTP 403, Status: 403 - > java.net.ProtocolException: Expected HTTP 101 response but was '403 > Forbidden' > at > okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:216) > at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:183) > at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141) > at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > > at java.lang.Thread.run(Thread.java:748) > {code} > > Looks like the issue is caused by fixes for a recent CVE : > CVE: [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-14809] > Fix: [https://github.com/fabric8io/kubernetes-client/pull/1669] > > Looks like upgrading kubernetes-client to 4.4.2 would solve this issue. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28921) Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10)
[ https://issues.apache.org/jira/browse/SPARK-28921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28921: -- Priority: Major (was: Critical) > Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10) > - > > Key: SPARK-28921 > URL: https://issues.apache.org/jira/browse/SPARK-28921 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.3.3, 2.4.3 >Reporter: Paul Schweigert >Priority: Major > > Spark jobs are failing on latest versions of Kubernetes when jobs attempt to > provision executor pods (jobs like Spark-Pi that do not launch executors run > without a problem): > > Here's an example error message: > > {code:java} > 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors > from Kubernetes. > 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors > from Kubernetes.19/08/30 01:29:09 WARN WatchConnectionManager: Exec Failure: > HTTP 403, Status: 403 - > java.net.ProtocolException: Expected HTTP 101 response but was '403 > Forbidden' > at > okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:216) > at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:183) > at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141) > at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > > at java.lang.Thread.run(Thread.java:748) > {code} > > Looks like the issue is caused by fixes for a recent CVE : > CVE: [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-14809] > Fix: [https://github.com/fabric8io/kubernetes-client/pull/1669] > > Looks like upgrading kubernetes-client to 4.4.2 would solve this issue. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28938) Kubernetes using unsupported docker image
Rodney Aaron Stainback created SPARK-28938: -- Summary: Kubernetes using unsupported docker image Key: SPARK-28938 URL: https://issues.apache.org/jira/browse/SPARK-28938 Project: Spark Issue Type: New Feature Components: Kubernetes Affects Versions: 2.4.3, 3.0.0 Environment: Kubernetes Reporter: Rodney Aaron Stainback The current docker image used by Kubernetes {code:java} openjdk:8-alpine{code} is not supported [https://github.com/docker-library/docs/blob/master/openjdk/README.md#supported-tags-and-respective-dockerfile-links] It was removed with this commit [https://github.com/docker-library/openjdk/commit/3eb0351b208d739fac35345c85e3c6237c2114ec#diff-f95ffa3d134732c33f7b8368e099] Quote from commit "4. no more OpenJDK 8 Alpine images (Alpine/musl is not officially supported by the OpenJDK project, so this reflects that -- see "Project Portola" for the Alpine porting efforts which I understand are still in need of help)" Please move to a supported image for Kubernetes -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28803) Document DESCRIBE TABLE in SQL Reference.
[ https://issues.apache.org/jira/browse/SPARK-28803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-28803. - Fix Version/s: 3.0.0 Assignee: Dilip Biswal Resolution: Fixed > Document DESCRIBE TABLE in SQL Reference. > - > > Key: SPARK-28803 > URL: https://issues.apache.org/jira/browse/SPARK-28803 > Project: Spark > Issue Type: Sub-task > Components: Documentation, SQL >Affects Versions: 2.4.3 >Reporter: Dilip Biswal >Assignee: Dilip Biswal >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25994) SPIP: Property Graphs, Cypher Queries, and Algorithms
[ https://issues.apache.org/jira/browse/SPARK-25994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920213#comment-16920213 ] Ruben Berenguel commented on SPARK-25994: - Hi [~mju], I’ve had a series of unforeseen increases in “stuff” that are preventing me from doing much open source work. For now I’ll stay as an interested bystander, if I manage to find time I’ll step in. > SPIP: Property Graphs, Cypher Queries, and Algorithms > - > > Key: SPARK-25994 > URL: https://issues.apache.org/jira/browse/SPARK-25994 > Project: Spark > Issue Type: Epic > Components: Graph >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Assignee: Martin Junghanns >Priority: Major > Labels: SPIP > > Copied from the SPIP doc: > {quote} > GraphX was one of the foundational pillars of the Spark project, and is the > current graph component. This reflects the importance of the graphs data > model, which naturally pairs with an important class of analytic function, > the network or graph algorithm. > However, GraphX is not actively maintained. It is based on RDDs, and cannot > exploit Spark 2’s Catalyst query engine. GraphX is only available to Scala > users. > GraphFrames is a Spark package, which implements DataFrame-based graph > algorithms, and also incorporates simple graph pattern matching with fixed > length patterns (called “motifs”). GraphFrames is based on DataFrames, but > has a semantically weak graph data model (based on untyped edges and > vertices). The motif pattern matching facility is very limited by comparison > with the well-established Cypher language. > The Property Graph data model has become quite widespread in recent years, > and is the primary focus of commercial graph data management and of graph > data research, both for on-premises and cloud data management. Many users of > transactional graph databases also wish to work with immutable graphs in > Spark. > The idea is to define a Cypher-compatible Property Graph type based on > DataFrames; to replace GraphFrames querying with Cypher; to reimplement > GraphX/GraphFrames algos on the PropertyGraph type. > To achieve this goal, a core subset of Cypher for Apache Spark (CAPS), > reusing existing proven designs and code, will be employed in Spark 3.0. This > graph query processor, like CAPS, will overlay and drive the SparkSQL > Catalyst query engine, using the CAPS graph query planner. > {quote} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28770) Flaky Tests: Test ReplayListenerSuite.End-to-end replay with compression failed
[ https://issues.apache.org/jira/browse/SPARK-28770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920212#comment-16920212 ] Wing Yew Poon commented on SPARK-28770: --- On my branch from which [https://github.com/apache/spark/pull/23767] was merged into master, I modified ReplayListenerSuite following [https://gist.github.com/dwickern/6ba9c5c505d2325d3737ace059302922], and ran "End-to-end replay with compression" 100 times. I encountered no failures. I ran this on my MacBook Pro. The instance of failure that Jungtaek cited appears to be due to a comparison of two SparkListenerStageExecutorMetrics events (one from the original, the other from the replay) failing. One event came from the driver and the other came from executor "1". SparkListenerStageExecutorMetrics events are logged at stage completion if spark.eventLog.logStageExecutorMetrics.enabled is set to true. The failure could be due to these events being in a different order in the replay than in the original. In the commit that first introduced these events, in ReplayListenerSuite, there was some code to filter out these events in the testApplicationReplay method of ReplayListenerSuite. (The code was to filter out the events from the original, not from the replay, which I didn't understand.) Maybe we could filter out the SparkListenerStageExecutorMetrics events (from both original and replay) in testApplicationReplay (which is called by "End-to-end replay" and "End-to-end replay with compression"), to avoid this flakiness. > Flaky Tests: Test ReplayListenerSuite.End-to-end replay with compression > failed > --- > > Key: SPARK-28770 > URL: https://issues.apache.org/jira/browse/SPARK-28770 > Project: Spark > Issue Type: Test > Components: Spark Core >Affects Versions: 2.4.3 > Environment: Community jenkins and our arm testing instance. >Reporter: huangtianhua >Priority: Major > > Test > org.apache.spark.scheduler.ReplayListenerSuite.End-to-end replay with > compression is failed see > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-3.2/267/testReport/junit/org.apache.spark.scheduler/ReplayListenerSuite/End_to_end_replay_with_compression/] > > And also the test is failed on arm instance, I sent email to spark-dev > before, and we suspect there is something related with the commit > [https://github.com/apache/spark/pull/23767], we tried to revert it and the > tests are passed: > ReplayListenerSuite: > - ... > - End-to-end replay *** FAILED *** > "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622) > - End-to-end replay with compression *** FAILED *** > "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622) > > Not sure what's wrong, hope someone can help to figure it out, thanks very > much. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27907) HiveUDAF should return NULL in case of 0 rows
[ https://issues.apache.org/jira/browse/SPARK-27907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-27907: -- Labels: correctness (was: ) > HiveUDAF should return NULL in case of 0 rows > - > > Key: SPARK-27907 > URL: https://issues.apache.org/jira/browse/SPARK-27907 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.3 >Reporter: Ajith S >Assignee: Ajith S >Priority: Blocker > Labels: correctness > Fix For: 2.3.4, 2.4.4, 3.0.0 > > > When query returns zero rows, the HiveUDAFFunction throws NPE > CASE 1: > create table abc(a int) > select histogram_numeric(a,2) from abc // NPE > Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most > recent failure: Lost task 0.0 in stage 1.0 (TID 0, localhost, executor > driver): java.lang.NullPointerException > at org.apache.spark.sql.hive.HiveUDAFFunction.eval(hiveUDFs.scala:471) > at org.apache.spark.sql.hive.HiveUDAFFunction.eval(hiveUDFs.scala:315) > at > org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.eval(interfaces.scala:543) > at > org.apache.spark.sql.execution.aggregate.AggregationIterator.$anonfun$generateResultProjection$5(AggregationIterator.scala:231) > at > org.apache.spark.sql.execution.aggregate.ObjectAggregationIterator.outputForEmptyGroupingKeyWithoutInput(ObjectAggregationIterator.scala:97) > at > org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec.$anonfun$doExecute$2(ObjectHashAggregateExec.scala:132) > at > org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec.$anonfun$doExecute$2$adapted(ObjectHashAggregateExec.scala:107) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndexInternal$2(RDD.scala:839) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndexInternal$2$adapted(RDD.scala:839) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:327) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:291) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:327) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:291) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:327) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:291) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:122) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:425) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1350) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:428) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > CASE 2: > create table abc(a int) > insert into abc values (1) > select histogram_numeric(a,2) from abc where a=3 //NPE > Job aborted due to stage failure: Task 0 in stage 4.0 failed 1 times, most > recent failure: Lost task 0.0 in stage 4.0 (TID 5, localhost, executor > driver): java.lang.NullPointerException > at > org.apache.spark.sql.hive.HiveUDAFFunction.serialize(hiveUDFs.scala:477) > at > org.apache.spark.sql.hive.HiveUDAFFunction.serialize(hiveUDFs.scala:315) > at > org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.serializeAggregateBufferInPlace(interfaces.scala:570) > at > org.apache.spark.sql.execution.aggregate.AggregationIterator.$anonfun$generateResultProjection$6(AggregationIterator.scala:254) > at > org.apache.spark.sql.execution.aggregate.ObjectAggregationIterator.outputForEmptyGroupingKeyWithoutInput(ObjectAggregationIterator.scala:97) > at > org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec.$anonfun$doExecute$2(ObjectHashAggregateExec.scala:132) > at > org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec.$anonfun$doExecute$2$adapted(ObjectHashAggregateExec.scala:107) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndexInternal$2(RDD.scala:839) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndexInternal$2$adapted(RDD.scala:839) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:327) > at
[jira] [Updated] (SPARK-28871) Some codes in 'Policy for handling multiple watermarks' does not show friendly
[ https://issues.apache.org/jira/browse/SPARK-28871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28871: -- Labels: (was: documentation) > Some codes in 'Policy for handling multiple watermarks' does not show > friendly > --- > > Key: SPARK-28871 > URL: https://issues.apache.org/jira/browse/SPARK-28871 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 2.4.3 >Reporter: chaiyongqiang >Assignee: chaiyongqiang >Priority: Minor > Fix For: 2.4.4, 3.0.0 > > Attachments: Policy_for_handling_multiple_watermarks.png > > > The codes in the 'Policy for handling multiple watermarks' in > structured-streaming-programming-guide does not show friendly. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28542) Document Stages page
[ https://issues.apache.org/jira/browse/SPARK-28542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-28542: -- Fix Version/s: 3.0.0 > Document Stages page > > > Key: SPARK-28542 > URL: https://issues.apache.org/jira/browse/SPARK-28542 > Project: Spark > Issue Type: Sub-task > Components: SQL, Web UI >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Minor > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28542) Document Stages page
[ https://issues.apache.org/jira/browse/SPARK-28542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-28542: - Fix Version/s: (was: 3.0.0) Assignee: (was: Pablo Langa Blanco) Priority: Minor (was: Major) > Document Stages page > > > Key: SPARK-28542 > URL: https://issues.apache.org/jira/browse/SPARK-28542 > Project: Spark > Issue Type: Sub-task > Components: SQL, Web UI >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28542) Document Stages page
[ https://issues.apache.org/jira/browse/SPARK-28542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-28542. --- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 25598 [https://github.com/apache/spark/pull/25598] > Document Stages page > > > Key: SPARK-28542 > URL: https://issues.apache.org/jira/browse/SPARK-28542 > Project: Spark > Issue Type: Sub-task > Components: SQL, Web UI >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Pablo Langa Blanco >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28542) Document Stages page
[ https://issues.apache.org/jira/browse/SPARK-28542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-28542: - Assignee: Pablo Langa Blanco > Document Stages page > > > Key: SPARK-28542 > URL: https://issues.apache.org/jira/browse/SPARK-28542 > Project: Spark > Issue Type: Sub-task > Components: SQL, Web UI >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Pablo Langa Blanco >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28932) Maven install fails on JDK11
[ https://issues.apache.org/jira/browse/SPARK-28932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-28932. --- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 25638 [https://github.com/apache/spark/pull/25638] > Maven install fails on JDK11 > > > Key: SPARK-28932 > URL: https://issues.apache.org/jira/browse/SPARK-28932 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Blocker > Fix For: 3.0.0 > > > {code} > mvn clean install -pl common/network-common -DskipTests > error: fatal error: object scala in compiler mirror not found. > one error found > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28916) Generated SpecificSafeProjection.apply method grows beyond 64 KB when use SparkSQL
[ https://issues.apache.org/jira/browse/SPARK-28916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920153#comment-16920153 ] Marco Gaido commented on SPARK-28916: - I think the problem is related to subexpression elimination. I've not been able to confirm since for some reasons I am not able to disable it, even though I set the config to false, it is performed anyway. Maybe I am missing something there. Anyway, you may try and set {{spark.sql.subexpressionElimination.enabled}} to {{false}}. Meanwhile I am working on a fix. Thanks. > Generated SpecificSafeProjection.apply method grows beyond 64 KB when use > SparkSQL > --- > > Key: SPARK-28916 > URL: https://issues.apache.org/jira/browse/SPARK-28916 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1, 2.4.3 >Reporter: MOBIN >Priority: Major > > Can be reproduced by the following steps: > 1. Create a table with 5000 fields > 2. val data=spark.sql("select * from spark64kb limit 10"); > 3. data.describe() > Then,The following error occurred > {code:java} > WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 (TID 0, localhost, > executor 1): org.codehaus.janino.InternalCompilerException: failed to > compile: org.codehaus.janino.InternalCompilerException: Compiling > "GeneratedClass": Code of method > "apply(Ljava/lang/Object;)Ljava/lang/Object;" of class > "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificMutableProjection" > grows beyond 64 KB > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1298) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1376) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1373) > at > org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) > at > org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) > at > org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) > at > org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) > at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000) > at org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) > at > org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1238) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateMutableProjection$.create(GenerateMutableProjection.scala:143) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateMutableProjection$.generate(GenerateMutableProjection.scala:44) > at > org.apache.spark.sql.execution.SparkPlan.newMutableProjection(SparkPlan.scala:385) > at > org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3$$anonfun$4.apply(SortAggregateExec.scala:96) > at > org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3$$anonfun$4.apply(SortAggregateExec.scala:95) > at > org.apache.spark.sql.execution.aggregate.AggregationIterator.generateProcessRow(AggregationIterator.scala:180) > at > org.apache.spark.sql.execution.aggregate.AggregationIterator.(AggregationIterator.scala:199) > at > org.apache.spark.sql.execution.aggregate.SortBasedAggregationIterator.(SortBasedAggregationIterator.scala:40) > at > org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortAggregateExec.scala:86) > at > org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortAggregateExec.scala:77) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$12.apply(RDD.scala:823) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$12.apply(RDD.scala:823) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:121) > at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(T
[jira] [Updated] (SPARK-28864) Add spark source connector for Aliyun Log Service
[ https://issues.apache.org/jira/browse/SPARK-28864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ke Li updated SPARK-28864: -- Description: Alibaba Log Service is a big data service which has been widely used in Alibaba Group and thousands of customers of Alibaba Cloud. The core storage engine of Log Service is named Loghub which is a large scale distributed storage system which provides producer and consumer to push and pull data like Kafka, AWS Kinesis and Azure Eventhub does. There are a lot of users of Log Service are using Spark Streaming, Spark SQL and Spark Structured Streaming to analysis data collected from both on premise and cloud data sources. Happy to hear any comments. was: Aliyun Log Service is a big data service which has been widely used in Alibaba Group and thousands of customers of Alibaba Cloud. The core storage engine of Log Service is named Loghub which is a large scale distributed storage system which provides producer and consumer to push and pull data like Kafka, AWS Kinesis and Azure Eventhub does. There are a lot of users of Log Service are using Spark Streaming, Spark SQL and Spark Structured Streaming to analysis data collected from both on premise and cloud data sources. Happy to hear any comments. > Add spark source connector for Aliyun Log Service > - > > Key: SPARK-28864 > URL: https://issues.apache.org/jira/browse/SPARK-28864 > Project: Spark > Issue Type: New Feature > Components: Input/Output >Affects Versions: 3.0.0 >Reporter: Ke Li >Priority: Major > > Alibaba Log Service is a big data service which has been widely used in > Alibaba Group and thousands of customers of Alibaba Cloud. The core storage > engine of Log Service is named Loghub which is a large scale distributed > storage system which provides producer and consumer to push and pull data > like Kafka, AWS Kinesis and Azure Eventhub does. > There are a lot of users of Log Service are using Spark Streaming, Spark SQL > and Spark Structured Streaming to analysis data collected from both on > premise and cloud data sources. > Happy to hear any comments. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28903) Fix AWS JDK version conflict that breaks Pyspark Kinesis tests
[ https://issues.apache.org/jira/browse/SPARK-28903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-28903. --- Fix Version/s: 3.0.0 2.4.5 Resolution: Fixed Issue resolved by pull request 25559 [https://github.com/apache/spark/pull/25559] > Fix AWS JDK version conflict that breaks Pyspark Kinesis tests > -- > > Key: SPARK-28903 > URL: https://issues.apache.org/jira/browse/SPARK-28903 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.0.0, 2.4.3 >Reporter: Sean Owen >Assignee: Sean Owen >Priority: Major > Fix For: 2.4.5, 3.0.0 > > > The Pyspark Kinesis tests are failing, at least in master: > {code} > == > ERROR: test_kinesis_stream > (pyspark.streaming.tests.test_kinesis.KinesisStreamTests) > -- > Traceback (most recent call last): > File > "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/streaming/tests/test_kinesis.py", > line 44, in test_kinesis_stream > kinesisTestUtils = > self.ssc._jvm.org.apache.spark.streaming.kinesis.KinesisTestUtils(2) > File > "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", > line 1554, in __call__ > answer, self._gateway_client, None, self._fqn) > File > "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/lib/py4j-0.10.8.1-src.zip/py4j/protocol.py", > line 328, in get_return_value > format(target_id, ".", name), value) > Py4JJavaError: An error occurred while calling > None.org.apache.spark.streaming.kinesis.KinesisTestUtils. > : java.lang.NoSuchMethodError: > com.amazonaws.regions.Region.getAvailableEndpoints()Ljava/util/Collection; > at > org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1(KinesisTestUtils.scala:211) > at > org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1$adapted(KinesisTestUtils.scala:211) > at scala.collection.Iterator.find(Iterator.scala:993) > at scala.collection.Iterator.find$(Iterator.scala:990) > at scala.collection.AbstractIterator.find(Iterator.scala:1429) > at scala.collection.IterableLike.find(IterableLike.scala:81) > at scala.collection.IterableLike.find$(IterableLike.scala:80) > at scala.collection.AbstractIterable.find(Iterable.scala:56) > at > org.apache.spark.streaming.kinesis.KinesisTestUtils$.getRegionNameByEndpoint(KinesisTestUtils.scala:211) > at > org.apache.spark.streaming.kinesis.KinesisTestUtils.(KinesisTestUtils.scala:46) > ... > {code} > The non-Python Kinesis tests are fine though. It turns out that this is > because Pyspark tests use the output of the Spark assembly, and it pulls in > hadoop-cloud, which in turn pulls in an old AWS Java SDK. > Per [~ste...@apache.org], it seems like we can just resolve this by excluding > the aws-java-sdk dependency. See the attached PR for some more detail about > the debugging and other options. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28921) Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10)
[ https://issues.apache.org/jira/browse/SPARK-28921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920133#comment-16920133 ] Andy Grove commented on SPARK-28921: Here's a PR to fix against master branch since it didn't automatically link to this JIRA: https://github.com/apache/spark/pull/25640 > Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10) > - > > Key: SPARK-28921 > URL: https://issues.apache.org/jira/browse/SPARK-28921 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.3.3, 2.4.3 >Reporter: Paul Schweigert >Priority: Critical > > Spark jobs are failing on latest versions of Kubernetes when jobs attempt to > provision executor pods (jobs like Spark-Pi that do not launch executors run > without a problem): > > Here's an example error message: > > {code:java} > 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors > from Kubernetes. > 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors > from Kubernetes.19/08/30 01:29:09 WARN WatchConnectionManager: Exec Failure: > HTTP 403, Status: 403 - > java.net.ProtocolException: Expected HTTP 101 response but was '403 > Forbidden' > at > okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:216) > at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:183) > at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141) > at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > > at java.lang.Thread.run(Thread.java:748) > {code} > > Looks like the issue is caused by fixes for a recent CVE : > CVE: [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-14809] > Fix: [https://github.com/fabric8io/kubernetes-client/pull/1669] > > Looks like upgrading kubernetes-client to 4.4.2 would solve this issue. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28916) Generated SpecificSafeProjection.apply method grows beyond 64 KB when use SparkSQL
[ https://issues.apache.org/jira/browse/SPARK-28916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920074#comment-16920074 ] Marco Gaido commented on SPARK-28916: - Thanks for reporting this. I am checking it. > Generated SpecificSafeProjection.apply method grows beyond 64 KB when use > SparkSQL > --- > > Key: SPARK-28916 > URL: https://issues.apache.org/jira/browse/SPARK-28916 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1, 2.4.3 >Reporter: MOBIN >Priority: Major > > Can be reproduced by the following steps: > 1. Create a table with 5000 fields > 2. val data=spark.sql("select * from spark64kb limit 10"); > 3. data.describe() > Then,The following error occurred > {code:java} > WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 (TID 0, localhost, > executor 1): org.codehaus.janino.InternalCompilerException: failed to > compile: org.codehaus.janino.InternalCompilerException: Compiling > "GeneratedClass": Code of method > "apply(Ljava/lang/Object;)Ljava/lang/Object;" of class > "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificMutableProjection" > grows beyond 64 KB > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1298) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1376) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1373) > at > org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) > at > org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) > at > org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) > at > org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) > at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000) > at org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) > at > org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1238) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateMutableProjection$.create(GenerateMutableProjection.scala:143) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateMutableProjection$.generate(GenerateMutableProjection.scala:44) > at > org.apache.spark.sql.execution.SparkPlan.newMutableProjection(SparkPlan.scala:385) > at > org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3$$anonfun$4.apply(SortAggregateExec.scala:96) > at > org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3$$anonfun$4.apply(SortAggregateExec.scala:95) > at > org.apache.spark.sql.execution.aggregate.AggregationIterator.generateProcessRow(AggregationIterator.scala:180) > at > org.apache.spark.sql.execution.aggregate.AggregationIterator.(AggregationIterator.scala:199) > at > org.apache.spark.sql.execution.aggregate.SortBasedAggregationIterator.(SortBasedAggregationIterator.scala:40) > at > org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortAggregateExec.scala:86) > at > org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortAggregateExec.scala:77) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$12.apply(RDD.scala:823) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$12.apply(RDD.scala:823) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:121) > at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.codehaus.janino.InternalCompilerException: Compiling > "GeneratedClass": Code of method > "apply(Ljava/lang/Object;)Ljava/lang/Object;" of class > "
[jira] [Commented] (SPARK-28934) Add `spark.sql.compatiblity.mode`
[ https://issues.apache.org/jira/browse/SPARK-28934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920072#comment-16920072 ] Marco Gaido commented on SPARK-28934: - Hi [~smilegator]! Thanks for opening this. I am wondering whether it may be worth to reopen SPARK-28610 and enable the option for the pgSQL compatibility mode. [~cloud_fan] what do you think? > Add `spark.sql.compatiblity.mode` > - > > Key: SPARK-28934 > URL: https://issues.apache.org/jira/browse/SPARK-28934 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Major > > This issue aims to add `spark.sql.compatiblity.mode` whose values are `spark` > or `pgSQL` case-insensitively to control PostgreSQL compatibility features. > > Apache Spark 3.0.0 can start with `spark.sql.parser.ansi.enabled=false` and > `spark.sql.compatiblity.mode=spark`. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28908) Structured Streaming Kafka sink support Exactly-Once semantics
[ https://issues.apache.org/jira/browse/SPARK-28908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wenxuanguan updated SPARK-28908: Attachment: (was: Kafka Sink Exactly-Once Semantics Design Sketch.pdf) > Structured Streaming Kafka sink support Exactly-Once semantics > -- > > Key: SPARK-28908 > URL: https://issues.apache.org/jira/browse/SPARK-28908 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.0.0 >Reporter: wenxuanguan >Priority: Major > Attachments: Kafka Sink Exactly-Once Semantics Design Sketch V1.pdf > > > Since Apache Kafka supports transaction from 0.11.0.0, we can implement Kafka > sink exactly-once semantics with transaction Kafka producer -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28908) Structured Streaming Kafka sink support Exactly-Once semantics
[ https://issues.apache.org/jira/browse/SPARK-28908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wenxuanguan updated SPARK-28908: Attachment: Kafka Sink Exactly-Once Semantics Design Sketch V1.pdf > Structured Streaming Kafka sink support Exactly-Once semantics > -- > > Key: SPARK-28908 > URL: https://issues.apache.org/jira/browse/SPARK-28908 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.0.0 >Reporter: wenxuanguan >Priority: Major > Attachments: Kafka Sink Exactly-Once Semantics Design Sketch V1.pdf, > Kafka Sink Exactly-Once Semantics Design Sketch.pdf > > > Since Apache Kafka supports transaction from 0.11.0.0, we can implement Kafka > sink exactly-once semantics with transaction Kafka producer -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org