[jira] [Commented] (SPARK-19796) taskScheduler fails serializing long statements received by thrift server

2017-03-06 Thread Imran Rashid (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897997#comment-15897997
 ] 

Imran Rashid commented on SPARK-19796:
--

I'm opposed to (b) as well.

It feels wrong to only do a one-off just for JOB_DESCRIPTION, but maybe its a 
large enough savings that its worth doing.  I was thinking of something larger, 
along the lines of SPARK-19108.  Another option would be to add new apis, eg., 
jobs would take `driverProperties` and `executorProperties`, but maybe that is 
overkill.

> taskScheduler fails serializing long statements received by thrift server
> -
>
> Key: SPARK-19796
> URL: https://issues.apache.org/jira/browse/SPARK-19796
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Giambattista
>Assignee: Imran Rashid
>Priority: Blocker
> Fix For: 2.2.0
>
>
> This problem was observed after the changes made for SPARK-17931.
> In my use-case I'm sending very long insert statements to Spark thrift server 
> and they are failing at TaskDescription.scala:89 because writeUTF fails if 
> requested to write strings longer than 64Kb (see 
> https://www.drillio.com/en/2009/java-encoded-string-too-long-64kb-limit/ for 
> a description of the issue).
> As suggested by Imran Rashid I tracked down the offending key: it is 
> "spark.job.description" and it contains the complete SQL statement.
> The problem can be reproduced by creating a table like:
> create table test (a int) using parquet
> and by sending an insert statement like:
> scala> val r = 1 to 128000
> scala> println("insert into table test values (" + r.mkString("),(") + ")")



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19796) taskScheduler fails serializing long statements received by thrift server

2017-03-02 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893736#comment-15893736
 ] 

Shivaram Venkataraman commented on SPARK-19796:
---

I think (a) is worth exploring in a new JIRA -- We should try to avoid sending 
data that we dont need on the executors during task execution.

> taskScheduler fails serializing long statements received by thrift server
> -
>
> Key: SPARK-19796
> URL: https://issues.apache.org/jira/browse/SPARK-19796
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Giambattista
>Priority: Blocker
>
> This problem was observed after the changes made for SPARK-17931.
> In my use-case I'm sending very long insert statements to Spark thrift server 
> and they are failing at TaskDescription.scala:89 because writeUTF fails if 
> requested to write strings longer than 64Kb (see 
> https://www.drillio.com/en/2009/java-encoded-string-too-long-64kb-limit/ for 
> a description of the issue).
> As suggested by Imran Rashid I tracked down the offending key: it is 
> "spark.job.description" and it contains the complete SQL statement.
> The problem can be reproduced by creating a table like:
> create table test (a int) using parquet
> and by sending an insert statement like:
> scala> val r = 1 to 128000
> scala> println("insert into table test values (" + r.mkString("),(") + ")")



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19796) taskScheduler fails serializing long statements received by thrift server

2017-03-02 Thread Mridul Muralidharan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893584#comment-15893584
 ] 

Mridul Muralidharan commented on SPARK-19796:
-


I would not prefer (b) - if we are worried that users are depending on a 
private property, sending a truncated version of it is to aggravate it ! I 
would rather fail-fast with missing value.

Having said that, while we should limit our internal usage of properties, since 
this is also used to propagate user specified key value pairs; adding limits or 
log messages might not be optimal. Worst case, if we start detecting that the 
properties Map is growing really large, we could broadcast it (ugh ?).

> taskScheduler fails serializing long statements received by thrift server
> -
>
> Key: SPARK-19796
> URL: https://issues.apache.org/jira/browse/SPARK-19796
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Giambattista
>Priority: Blocker
>
> This problem was observed after the changes made for SPARK-17931.
> In my use-case I'm sending very long insert statements to Spark thrift server 
> and they are failing at TaskDescription.scala:89 because writeUTF fails if 
> requested to write strings longer than 64Kb (see 
> https://www.drillio.com/en/2009/java-encoded-string-too-long-64kb-limit/ for 
> a description of the issue).
> As suggested by Imran Rashid I tracked down the offending key: it is 
> "spark.job.description" and it contains the complete SQL statement.
> The problem can be reproduced by creating a table like:
> create table test (a int) using parquet
> and by sending an insert statement like:
> scala> val r = 1 to 128000
> scala> println("insert into table test values (" + r.mkString("),(") + ")")



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19796) taskScheduler fails serializing long statements received by thrift server

2017-03-02 Thread Kay Ousterhout (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893376#comment-15893376
 ] 

Kay Ousterhout commented on SPARK-19796:


Do you think we should (separately) fix the underlying problem?  Specifically, 
we could:

(a) not send the SPARK_JOB_DESCRIPTION property to the workers, since it's only 
used on the master for the UI (and while users *could* access it, the variable 
name SPARK_JOB_DESCRIPTION is spark-private, which suggests that it shouldn't 
be used by users).  Perhaps this is too risky because users could be using it?

(b) Truncate SPARK_JOB_DESCRIPTION to something reasonable (100 characters?) 
before sending it to the workers.  This is more backwards compatible if users 
are actually reading the property, but maybe a useless intermediate approach?

(c) (Possibly in addition to one of the above) Log a warning if any of the 
properties is longer than 100 characters (or some threshold).

Thoughts?  I can file a JIRA if you think any of these is worthwhile.

> taskScheduler fails serializing long statements received by thrift server
> -
>
> Key: SPARK-19796
> URL: https://issues.apache.org/jira/browse/SPARK-19796
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Giambattista
>Priority: Blocker
>
> This problem was observed after the changes made for SPARK-17931.
> In my use-case I'm sending very long insert statements to Spark thrift server 
> and they are failing at TaskDescription.scala:89 because writeUTF fails if 
> requested to write strings longer than 64Kb (see 
> https://www.drillio.com/en/2009/java-encoded-string-too-long-64kb-limit/ for 
> a description of the issue).
> As suggested by Imran Rashid I tracked down the offending key: it is 
> "spark.job.description" and it contains the complete SQL statement.
> The problem can be reproduced by creating a table like:
> create table test (a int) using parquet
> and by sending an insert statement like:
> scala> val r = 1 to 128000
> scala> println("insert into table test values (" + r.mkString("),(") + ")")



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19796) taskScheduler fails serializing long statements received by thrift server

2017-03-02 Thread Imran Rashid (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892547#comment-15892547
 ] 

Imran Rashid commented on SPARK-19796:
--

[~kayousterhout] [~shivaram] here's another example of serializing lots of 
pointless data in each task -- in this case, {{TaskDescription.properties}} 
contains lots of data which the executors don't care about.  and this gets 
serialized once per task.

For this jira, I'll just do a small fix, but I thought you might be interested 
in this.

> taskScheduler fails serializing long statements received by thrift server
> -
>
> Key: SPARK-19796
> URL: https://issues.apache.org/jira/browse/SPARK-19796
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Giambattista
>Priority: Blocker
>
> This problem was observed after the changes made for SPARK-17931.
> In my use-case I'm sending very long insert statements to Spark thrift server 
> and they are failing at TaskDescription.scala:89 because writeUTF fails if 
> requested to write strings longer than 64Kb (see 
> https://www.drillio.com/en/2009/java-encoded-string-too-long-64kb-limit/ for 
> a description of the issue).
> As suggested by Imran Rashid I tracked down the offending key: it is 
> "spark.job.description" and it contains the complete SQL statement.
> The problem can be reproduced by creating a table like:
> create table test (a int) using parquet
> and by sending an insert statement like:
> scala> val r = 1 to 128000
> scala> println("insert into table test values (" + r.mkString("),(") + ")")



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19796) taskScheduler fails serializing long statements received by thrift server

2017-03-02 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892482#comment-15892482
 ] 

Apache Spark commented on SPARK-19796:
--

User 'squito' has created a pull request for this issue:
https://github.com/apache/spark/pull/17140

> taskScheduler fails serializing long statements received by thrift server
> -
>
> Key: SPARK-19796
> URL: https://issues.apache.org/jira/browse/SPARK-19796
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Giambattista
>Priority: Blocker
>
> This problem was observed after the changes made for SPARK-17931.
> In my use-case I'm sending very long insert statements to Spark thrift server 
> and they are failing at TaskDescription.scala:89 because writeUTF fails if 
> requested to write strings longer than 64Kb (see 
> https://www.drillio.com/en/2009/java-encoded-string-too-long-64kb-limit/ for 
> a description of the issue).
> As suggested by Imran Rashid I tracked down the offending key: it is 
> "spark.job.description" and it contains the complete SQL statement.
> The problem can be reproduced by creating a table like:
> create table test (a int) using parquet
> and by sending an insert statement like:
> scala> val r = 1 to 128000
> scala> println("insert into table test values (" + r.mkString("),(") + ")")



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19796) taskScheduler fails serializing long statements received by thrift server

2017-03-02 Thread Imran Rashid (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892390#comment-15892390
 ] 

Imran Rashid commented on SPARK-19796:
--

Since its a regression, I'm making this a blocker for 2.2.0  (or else we revert 
SPARK-17931, but the fix should be simple).

> taskScheduler fails serializing long statements received by thrift server
> -
>
> Key: SPARK-19796
> URL: https://issues.apache.org/jira/browse/SPARK-19796
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Giambattista
>Priority: Blocker
>
> This problem was observed after the changes made for SPARK-17931.
> In my use-case I'm sending very long insert statements to Spark thrift server 
> and they are failing at TaskDescription.scala:89 because writeUTF fails if 
> requested to write strings longer than 64Kb (see 
> https://www.drillio.com/en/2009/java-encoded-string-too-long-64kb-limit/ for 
> a description of the issue).
> As suggested by Imran Rashid I tracked down the offending key: it is 
> "spark.job.description" and it contains the complete SQL statement.
> The problem can be reproduced by creating a table like:
> create table test (a int) using parquet
> and by sending an insert statement like:
> scala> val r = 1 to 128000
> scala> println("insert into table test values (" + r.mkString("),(") + ")")



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org