[jira] [Commented] (SPARK-19796) taskScheduler fails serializing long statements received by thrift server
[ https://issues.apache.org/jira/browse/SPARK-19796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897997#comment-15897997 ] Imran Rashid commented on SPARK-19796: -- I'm opposed to (b) as well. It feels wrong to only do a one-off just for JOB_DESCRIPTION, but maybe its a large enough savings that its worth doing. I was thinking of something larger, along the lines of SPARK-19108. Another option would be to add new apis, eg., jobs would take `driverProperties` and `executorProperties`, but maybe that is overkill. > taskScheduler fails serializing long statements received by thrift server > - > > Key: SPARK-19796 > URL: https://issues.apache.org/jira/browse/SPARK-19796 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Giambattista >Assignee: Imran Rashid >Priority: Blocker > Fix For: 2.2.0 > > > This problem was observed after the changes made for SPARK-17931. > In my use-case I'm sending very long insert statements to Spark thrift server > and they are failing at TaskDescription.scala:89 because writeUTF fails if > requested to write strings longer than 64Kb (see > https://www.drillio.com/en/2009/java-encoded-string-too-long-64kb-limit/ for > a description of the issue). > As suggested by Imran Rashid I tracked down the offending key: it is > "spark.job.description" and it contains the complete SQL statement. > The problem can be reproduced by creating a table like: > create table test (a int) using parquet > and by sending an insert statement like: > scala> val r = 1 to 128000 > scala> println("insert into table test values (" + r.mkString("),(") + ")") -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19796) taskScheduler fails serializing long statements received by thrift server
[ https://issues.apache.org/jira/browse/SPARK-19796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893736#comment-15893736 ] Shivaram Venkataraman commented on SPARK-19796: --- I think (a) is worth exploring in a new JIRA -- We should try to avoid sending data that we dont need on the executors during task execution. > taskScheduler fails serializing long statements received by thrift server > - > > Key: SPARK-19796 > URL: https://issues.apache.org/jira/browse/SPARK-19796 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Giambattista >Priority: Blocker > > This problem was observed after the changes made for SPARK-17931. > In my use-case I'm sending very long insert statements to Spark thrift server > and they are failing at TaskDescription.scala:89 because writeUTF fails if > requested to write strings longer than 64Kb (see > https://www.drillio.com/en/2009/java-encoded-string-too-long-64kb-limit/ for > a description of the issue). > As suggested by Imran Rashid I tracked down the offending key: it is > "spark.job.description" and it contains the complete SQL statement. > The problem can be reproduced by creating a table like: > create table test (a int) using parquet > and by sending an insert statement like: > scala> val r = 1 to 128000 > scala> println("insert into table test values (" + r.mkString("),(") + ")") -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19796) taskScheduler fails serializing long statements received by thrift server
[ https://issues.apache.org/jira/browse/SPARK-19796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893584#comment-15893584 ] Mridul Muralidharan commented on SPARK-19796: - I would not prefer (b) - if we are worried that users are depending on a private property, sending a truncated version of it is to aggravate it ! I would rather fail-fast with missing value. Having said that, while we should limit our internal usage of properties, since this is also used to propagate user specified key value pairs; adding limits or log messages might not be optimal. Worst case, if we start detecting that the properties Map is growing really large, we could broadcast it (ugh ?). > taskScheduler fails serializing long statements received by thrift server > - > > Key: SPARK-19796 > URL: https://issues.apache.org/jira/browse/SPARK-19796 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Giambattista >Priority: Blocker > > This problem was observed after the changes made for SPARK-17931. > In my use-case I'm sending very long insert statements to Spark thrift server > and they are failing at TaskDescription.scala:89 because writeUTF fails if > requested to write strings longer than 64Kb (see > https://www.drillio.com/en/2009/java-encoded-string-too-long-64kb-limit/ for > a description of the issue). > As suggested by Imran Rashid I tracked down the offending key: it is > "spark.job.description" and it contains the complete SQL statement. > The problem can be reproduced by creating a table like: > create table test (a int) using parquet > and by sending an insert statement like: > scala> val r = 1 to 128000 > scala> println("insert into table test values (" + r.mkString("),(") + ")") -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19796) taskScheduler fails serializing long statements received by thrift server
[ https://issues.apache.org/jira/browse/SPARK-19796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893376#comment-15893376 ] Kay Ousterhout commented on SPARK-19796: Do you think we should (separately) fix the underlying problem? Specifically, we could: (a) not send the SPARK_JOB_DESCRIPTION property to the workers, since it's only used on the master for the UI (and while users *could* access it, the variable name SPARK_JOB_DESCRIPTION is spark-private, which suggests that it shouldn't be used by users). Perhaps this is too risky because users could be using it? (b) Truncate SPARK_JOB_DESCRIPTION to something reasonable (100 characters?) before sending it to the workers. This is more backwards compatible if users are actually reading the property, but maybe a useless intermediate approach? (c) (Possibly in addition to one of the above) Log a warning if any of the properties is longer than 100 characters (or some threshold). Thoughts? I can file a JIRA if you think any of these is worthwhile. > taskScheduler fails serializing long statements received by thrift server > - > > Key: SPARK-19796 > URL: https://issues.apache.org/jira/browse/SPARK-19796 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Giambattista >Priority: Blocker > > This problem was observed after the changes made for SPARK-17931. > In my use-case I'm sending very long insert statements to Spark thrift server > and they are failing at TaskDescription.scala:89 because writeUTF fails if > requested to write strings longer than 64Kb (see > https://www.drillio.com/en/2009/java-encoded-string-too-long-64kb-limit/ for > a description of the issue). > As suggested by Imran Rashid I tracked down the offending key: it is > "spark.job.description" and it contains the complete SQL statement. > The problem can be reproduced by creating a table like: > create table test (a int) using parquet > and by sending an insert statement like: > scala> val r = 1 to 128000 > scala> println("insert into table test values (" + r.mkString("),(") + ")") -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19796) taskScheduler fails serializing long statements received by thrift server
[ https://issues.apache.org/jira/browse/SPARK-19796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892547#comment-15892547 ] Imran Rashid commented on SPARK-19796: -- [~kayousterhout] [~shivaram] here's another example of serializing lots of pointless data in each task -- in this case, {{TaskDescription.properties}} contains lots of data which the executors don't care about. and this gets serialized once per task. For this jira, I'll just do a small fix, but I thought you might be interested in this. > taskScheduler fails serializing long statements received by thrift server > - > > Key: SPARK-19796 > URL: https://issues.apache.org/jira/browse/SPARK-19796 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Giambattista >Priority: Blocker > > This problem was observed after the changes made for SPARK-17931. > In my use-case I'm sending very long insert statements to Spark thrift server > and they are failing at TaskDescription.scala:89 because writeUTF fails if > requested to write strings longer than 64Kb (see > https://www.drillio.com/en/2009/java-encoded-string-too-long-64kb-limit/ for > a description of the issue). > As suggested by Imran Rashid I tracked down the offending key: it is > "spark.job.description" and it contains the complete SQL statement. > The problem can be reproduced by creating a table like: > create table test (a int) using parquet > and by sending an insert statement like: > scala> val r = 1 to 128000 > scala> println("insert into table test values (" + r.mkString("),(") + ")") -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19796) taskScheduler fails serializing long statements received by thrift server
[ https://issues.apache.org/jira/browse/SPARK-19796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892482#comment-15892482 ] Apache Spark commented on SPARK-19796: -- User 'squito' has created a pull request for this issue: https://github.com/apache/spark/pull/17140 > taskScheduler fails serializing long statements received by thrift server > - > > Key: SPARK-19796 > URL: https://issues.apache.org/jira/browse/SPARK-19796 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Giambattista >Priority: Blocker > > This problem was observed after the changes made for SPARK-17931. > In my use-case I'm sending very long insert statements to Spark thrift server > and they are failing at TaskDescription.scala:89 because writeUTF fails if > requested to write strings longer than 64Kb (see > https://www.drillio.com/en/2009/java-encoded-string-too-long-64kb-limit/ for > a description of the issue). > As suggested by Imran Rashid I tracked down the offending key: it is > "spark.job.description" and it contains the complete SQL statement. > The problem can be reproduced by creating a table like: > create table test (a int) using parquet > and by sending an insert statement like: > scala> val r = 1 to 128000 > scala> println("insert into table test values (" + r.mkString("),(") + ")") -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19796) taskScheduler fails serializing long statements received by thrift server
[ https://issues.apache.org/jira/browse/SPARK-19796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892390#comment-15892390 ] Imran Rashid commented on SPARK-19796: -- Since its a regression, I'm making this a blocker for 2.2.0 (or else we revert SPARK-17931, but the fix should be simple). > taskScheduler fails serializing long statements received by thrift server > - > > Key: SPARK-19796 > URL: https://issues.apache.org/jira/browse/SPARK-19796 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Giambattista >Priority: Blocker > > This problem was observed after the changes made for SPARK-17931. > In my use-case I'm sending very long insert statements to Spark thrift server > and they are failing at TaskDescription.scala:89 because writeUTF fails if > requested to write strings longer than 64Kb (see > https://www.drillio.com/en/2009/java-encoded-string-too-long-64kb-limit/ for > a description of the issue). > As suggested by Imran Rashid I tracked down the offending key: it is > "spark.job.description" and it contains the complete SQL statement. > The problem can be reproduced by creating a table like: > create table test (a int) using parquet > and by sending an insert statement like: > scala> val r = 1 to 128000 > scala> println("insert into table test values (" + r.mkString("),(") + ")") -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org