date:20170317

[jira] [Comment Edited] (SPARK-20006) Separate threshold for broadcast and shuffled hash join

2017-03-17 Thread Zhan Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15931054#comment-15931054
 ] 

Zhan Zhang edited comment on SPARK-20006 at 3/18/17 4:42 AM:
-

The default ShuffledHashJoin threshold can fallback to the broadcast one. A 
separate configuration does provide us opportunities to optimize the join 
dramatically. It would be great if CBO can automatically find the best 
strategy. But probably I miss something. Currently the CBO does not collect 
right statistics, especially for partitioned table. I have opened a JIRA for 
that issue as well. https://issues.apache.org/jira/browse/SPARK-19890


was (Author: zhzhan):
The default ShuffledHashJoin threshold can fallback to the broadcast one. A 
separate configuration does provide us opportunities to optimize the join 
dramatically. It would be great if CBO can automatically find the best 
strategy. But probably I miss something. Currently the CBO does not collect 
right statistics, especially for partitioned table. 
https://issues.apache.org/jira/browse/SPARK-19890

> Separate threshold for broadcast and shuffled hash join
> ---
>
> Key: SPARK-20006
> URL: https://issues.apache.org/jira/browse/SPARK-20006
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Zhan Zhang
>Priority: Minor
>
> Currently both canBroadcast and canBuildLocalHashMap use the same 
> configuration: AUTO_BROADCASTJOIN_THRESHOLD. 
> But the memory model may be different. For broadcast, currently the hash map 
> is always build on heap. For shuffledHashJoin, the hash map may be build on 
> heap(longHash), or off heap(other map if off heap is enabled). The same 
> configuration makes the configuration hard to tune (how to allocate memory 
> onheap/offheap). Propose to use different configuration. Please comments 
> whether it is reasonable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20006) Separate threshold for broadcast and shuffled hash join

2017-03-17 Thread Zhan Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15931054#comment-15931054
 ] 

Zhan Zhang commented on SPARK-20006:


The default ShuffledHashJoin threshold can fallback to the broadcast one. A 
separate configuration does provide us opportunities to optimize the join 
dramatically. It would be great if CBO can automatically find the best 
strategy. But probably I miss something. Currently the CBO does not collect 
right statistics, especially for partitioned table. 
https://issues.apache.org/jira/browse/SPARK-19890

> Separate threshold for broadcast and shuffled hash join
> ---
>
> Key: SPARK-20006
> URL: https://issues.apache.org/jira/browse/SPARK-20006
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Zhan Zhang
>Priority: Minor
>
> Currently both canBroadcast and canBuildLocalHashMap use the same 
> configuration: AUTO_BROADCASTJOIN_THRESHOLD. 
> But the memory model may be different. For broadcast, currently the hash map 
> is always build on heap. For shuffledHashJoin, the hash map may be build on 
> heap(longHash), or off heap(other map if off heap is enabled). The same 
> configuration makes the configuration hard to tune (how to allocate memory 
> onheap/offheap). Propose to use different configuration. Please comments 
> whether it is reasonable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20006) Separate threshold for broadcast and shuffled hash join

2017-03-17 Thread Takeshi Yamamuro (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15931050#comment-15931050
 ] 

Takeshi Yamamuro commented on SPARK-20006:
--

I feel more options we have for controlling plan strategies, more difficult 
users use DataFrame/Dataset. Essentially, CBO should control these kinds of 
things, I think. 

> Separate threshold for broadcast and shuffled hash join
> ---
>
> Key: SPARK-20006
> URL: https://issues.apache.org/jira/browse/SPARK-20006
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Zhan Zhang
>Priority: Minor
>
> Currently both canBroadcast and canBuildLocalHashMap use the same 
> configuration: AUTO_BROADCASTJOIN_THRESHOLD. 
> But the memory model may be different. For broadcast, currently the hash map 
> is always build on heap. For shuffledHashJoin, the hash map may be build on 
> heap(longHash), or off heap(other map if off heap is enabled). The same 
> configuration makes the configuration hard to tune (how to allocate memory 
> onheap/offheap). Propose to use different configuration. Please comments 
> whether it is reasonable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20009) Use user-friendly DDL formats for defining a schema in user-facing APIs

2017-03-17 Thread Takeshi Yamamuro (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15931044#comment-15931044
 ] 

Takeshi Yamamuro commented on SPARK-20009:
--

Does this make sense? cc: [~smilegator] My prototype is here: 
https://github.com/apache/spark/compare/master...maropu:UserDDLForSchema

> Use user-friendly DDL formats for defining a schema  in user-facing APIs
> 
>
> Key: SPARK-20009
> URL: https://issues.apache.org/jira/browse/SPARK-20009
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Takeshi Yamamuro
>
> In https://issues.apache.org/jira/browse/SPARK-19830, we add a new API in the 
> DDL parser to convert a DDL string into a schema. Then, we can use DDL 
> formats in existing some APIs, e.g., functions.from_json 
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L3062.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-20009) Use user-friendly DDL formats for defining a schema in user-facing APIs

2017-03-17 Thread Takeshi Yamamuro (JIRA)

Takeshi Yamamuro created SPARK-20009:


 Summary: Use user-friendly DDL formats for defining a schema  in 
user-facing APIs
 Key: SPARK-20009
 URL: https://issues.apache.org/jira/browse/SPARK-20009
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.1.0
Reporter: Takeshi Yamamuro


In https://issues.apache.org/jira/browse/SPARK-19830, we add a new API in the 
DDL parser to convert a DDL string into a schema. Then, we can use DDL formats 
in existing some APIs, e.g., functions.from_json 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L3062.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18886) Delay scheduling should not delay some executors indefinitely if one task is scheduled before delay timeout

2017-03-17 Thread Kay Ousterhout (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-18886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15931009#comment-15931009
]

Kay Ousterhout commented on SPARK-18886:

Sorry for the slow response here! I realized this is the same issue as
SPARK-11460 (although that JIRA proposed a slightly different solution), which
stalled for reasons that are completely my fault (I neglected it because I
couldn't think of a practical way of solving it).

Imran, unfortunately I don't think your latest idea will quite work. Delay
scheduling was originally intended for situations where the number of slots
that a particular job could use was limited by a fairness policy. In that
case, it can be better to wait a bit for a "better" slot (i.e., one that
satisfies locality preferences). In particular, if you never wait, you end up
with this "sticky slot" issue where tasks for a job keep finishing up in a
"bad" slot (one with no locality preferences), and then they'll be re-offered
to the same job, which will again accept the bad slot. If the job just waited
a bit, it could get a better slot (e.g., as a result of tasks from another job
finishing). [1]

This relates to your idea because of the following situation: suppose you have
a cluster with 10 machines, the job has locality preferences for 5 of them
(with ids 1, 2, 3, 4, 5), and fairness dictates that the job can only use 3
slots at a time (e.g., it's sharing equally with 2 other jobs). Suppose that
for a long time, the job has been running tasks on slots 1, 2, and 3 (so local
slots). At this point, the times for machines 6, 7, 8, 9, and 10 will have
expired, because the job has been running for a while. But if the job is now
offered a slot on one of those non-local machines (e.g., 6), the job hasn't
been waiting long for non-local resources: until this point, it's been running
it's full share of 3 slots at a time, and it's been doing so on machines that
satisfy locality preferences. So, we shouldn't accept that slot on machine 6
-- we should wait a bit to see if we can get a slot on 1, 2, 3, 4, or 5.

The solution I proposed (in a long PR comment) for the other JIRA is: if the
task set is using fewer than the number of slots it could be using (where “#
slots it could be using” is all of the slots in the cluster if the job is
running alone, or the job’s fair share, if it’s not) for some period of time,
increase the locality level. The problem with that solution is that I thought
it was completely impractical to determine the number of slots a TSM "should"
be allowed to use.

However, after thinking about this more today, I think we might be able to do
this in a practical way:
- First, I thought that we could use information about when offers are rejected
to determine this (e.g., if you've been rejecting offers for a while, then
you're not using your fair share). But the problem here is that it's not easy
to determine when you *are* using your fair / allowed share: accepting a single
offer doesn't necessarily mean that you're now using the allowed share. This
is precisely the problem with the current approach, hence this JIRA.
- v1: one possible proxy for this is if there are slots that are currently
available that haven't been accepted by any job. The TaskSchedulerImpl could
feasibly pass this information to each TaskSetManager, and the TSM could use it
to update it's delay timer: something like only reset the delay timer to 0 if
(a) the TSM accepts an offer and (b) the flag passed by the TSM indicates that
there are no other unused slots in the cluster. This fixes the problem
described in the JIRA: in that case, the flag would indicate that there *were*
other unused slots, even though a task got successfully scheduled with this
offer, so the delay timer wouldn't be reset, and would eventually correctly
expire.
- v2: The problem with v1 is that it doesn't correctly handle situations where
e.g., you have two jobs A and B with equal shares. B is "greedy" and will
accept any slot (e.g., it's a reduce stage), and A is doing delay scheduling.
In this case, A might have much less than its share, but the flag from the
TaskSchedulerImpl would indicate that there were no other free slots in the
cluster, so the delay timer wouldn't ever expire. I suspect we could handle
this (e.g., with some logic in the TaskSchedulerImpl to detect when a
particular TSM is getting starved: when it keeps rejecting offers that are
later accepted by someone else) but before thinking about this further, I
wanted to run the general idea by you to see what your thoughts are.

[1] There's a whole side question / discussion of how often this is useful for
Spark at all. It can be useful if you're running in a shared cluster where
e.g. Yarn might be assigning you more slots over time, and it's also useful
when a single Spark context is being shared across many

83 matches

Mail list logo