[jira] [Comment Edited] (SPARK-24552) Task attempt numbers are reused when stages are retried
[ https://issues.apache.org/jira/browse/SPARK-24552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16519445#comment-16519445 ] Thomas Graves edited comment on SPARK-24552 at 6/21/18 3:02 PM: more details on hadoop committer side: So I think the commit/delete thing is also an issue for existing v1 and hadoop committers as well. So this doesn't fully solve the problem. spark uses a file format like (HadoopMapReduceWriteConfigUtil/HadoopMapRedWriteConfigUtil): {code:java} {date}_{rddid}_{m/r}_{partitionid}_{task attempt number} {code} I believe the same fix as the v2 would work using the taskAttemptId instead of the attemptNumber. In the case we have the stage failure and a second stage attempt the task attempt number could be the same and thus both tasks write to the same place. If one of them fails or is told not to commit it could delete the output which is being used by both. Need to think through all the scenarios to make sure its covered. was (Author: tgraves): more details on hadoop committer side: So I think the commit/delete thing is also an issue for existing v1 and hadoop committers as well. So this doesn't fully solve the problem. spark uses a file format like (HadoopMapReduceWriteConfigUtil/HadoopMapRedWriteConfigUtil): {quote}{date}__{rddid}__{m/r}__{partitionid}__{task attempt number} {quote} I believe the same fix as the v2 would work using the taskAttemptId instead of the attemptNumber. In the case we have the stage failure and a second stage attempt the task attempt number could be the same and thus both tasks write to the same place. If one of them fails or is told not to commit it could delete the output which is being used by both. Need to think through all the scenarios to make sure its covered. > Task attempt numbers are reused when stages are retried > --- > > Key: SPARK-24552 > URL: https://issues.apache.org/jira/browse/SPARK-24552 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.1, 2.2.0, 2.2.1, 2.3.0, 2.3.1 >Reporter: Ryan Blue >Priority: Blocker > > When stages are retried due to shuffle failures, task attempt numbers are > reused. This causes a correctness bug in the v2 data sources write path. > Data sources (both the original and v2) pass the task attempt to writers so > that writers can use the attempt number to track and clean up data from > failed or speculative attempts. In the v2 docs for DataWriterFactory, the > attempt number's javadoc states that "Implementations can use this attempt > number to distinguish writers of different task attempts." > When two attempts of a stage use the same (partition, attempt) pair, two > tasks can create the same data and attempt to commit. The commit coordinator > prevents both from committing and will abort the attempt that finishes last. > When using the (partition, attempt) pair to track data, the aborted task may > delete data associated with the (partition, attempt) pair. If that happens, > the data for the task that committed is also deleted as well, which is a > correctness bug. > For a concrete example, I have a data source that creates files in place > named with {{part---.}}. Because these > files are written in place, both tasks create the same file and the one that > is aborted deletes the file, leading to data corruption when the file is > added to the table. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-24552) Task attempt numbers are reused when stages are retried
[ https://issues.apache.org/jira/browse/SPARK-24552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16519445#comment-16519445 ] Thomas Graves edited comment on SPARK-24552 at 6/21/18 3:01 PM: more details on hadoop committer side: So I think the commit/delete thing is also an issue for existing v1 and hadoop committers as well. So this doesn't fully solve the problem. spark uses a file format like (HadoopMapReduceWriteConfigUtil/HadoopMapRedWriteConfigUtil): {quote}{date}__{rddid}__{m/r}__{partitionid}__{task attempt number} {quote} I believe the same fix as the v2 would work using the taskAttemptId instead of the attemptNumber. In the case we have the stage failure and a second stage attempt the task attempt number could be the same and thus both tasks write to the same place. If one of them fails or is told not to commit it could delete the output which is being used by both. Need to think through all the scenarios to make sure its covered. was (Author: tgraves): more details on hadoop committer side: So I think the commit/delete thing is also an issue for existing v1 and hadoop committers as well. So this doesn't fully solve the problem. spark uses a file format like (HadoopMapReduceWriteConfigUtil/HadoopMapRedWriteConfigUtil): {quote}{{{date}_\{rddid}_\{m/r}_\{partitionid}_\{task attempt number}}} {quote} I believe the same fix as the v2 would work using the taskAttemptId instead of the attemptNumber. In the case we have the stage failure and a second stage attempt the task attempt number could be the same and thus both tasks write to the same place. If one of them fails or is told not to commit it could delete the output which is being used by both. Need to think through all the scenarios to make sure its covered. > Task attempt numbers are reused when stages are retried > --- > > Key: SPARK-24552 > URL: https://issues.apache.org/jira/browse/SPARK-24552 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.1, 2.2.0, 2.2.1, 2.3.0, 2.3.1 >Reporter: Ryan Blue >Priority: Blocker > > When stages are retried due to shuffle failures, task attempt numbers are > reused. This causes a correctness bug in the v2 data sources write path. > Data sources (both the original and v2) pass the task attempt to writers so > that writers can use the attempt number to track and clean up data from > failed or speculative attempts. In the v2 docs for DataWriterFactory, the > attempt number's javadoc states that "Implementations can use this attempt > number to distinguish writers of different task attempts." > When two attempts of a stage use the same (partition, attempt) pair, two > tasks can create the same data and attempt to commit. The commit coordinator > prevents both from committing and will abort the attempt that finishes last. > When using the (partition, attempt) pair to track data, the aborted task may > delete data associated with the (partition, attempt) pair. If that happens, > the data for the task that committed is also deleted as well, which is a > correctness bug. > For a concrete example, I have a data source that creates files in place > named with {{part---.}}. Because these > files are written in place, both tasks create the same file and the one that > is aborted deletes the file, leading to data corruption when the file is > added to the table. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24552) Task attempt numbers are reused when stages are retried
[ https://issues.apache.org/jira/browse/SPARK-24552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16519445#comment-16519445 ] Thomas Graves commented on SPARK-24552: --- more details on hadoop committer side: So I think the commit/delete thing is also an issue for existing v1 and hadoop committers as well. So this doesn't fully solve the problem. spark uses a file format like (HadoopMapReduceWriteConfigUtil/HadoopMapRedWriteConfigUtil): {quote}{{{date}_\{rddid}_\{m/r}_\{partitionid}_\{task attempt number}}} {quote} I believe the same fix as the v2 would work using the taskAttemptId instead of the attemptNumber. In the case we have the stage failure and a second stage attempt the task attempt number could be the same and thus both tasks write to the same place. If one of them fails or is told not to commit it could delete the output which is being used by both. Need to think through all the scenarios to make sure its covered. > Task attempt numbers are reused when stages are retried > --- > > Key: SPARK-24552 > URL: https://issues.apache.org/jira/browse/SPARK-24552 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.1, 2.2.0, 2.2.1, 2.3.0, 2.3.1 >Reporter: Ryan Blue >Priority: Blocker > > When stages are retried due to shuffle failures, task attempt numbers are > reused. This causes a correctness bug in the v2 data sources write path. > Data sources (both the original and v2) pass the task attempt to writers so > that writers can use the attempt number to track and clean up data from > failed or speculative attempts. In the v2 docs for DataWriterFactory, the > attempt number's javadoc states that "Implementations can use this attempt > number to distinguish writers of different task attempts." > When two attempts of a stage use the same (partition, attempt) pair, two > tasks can create the same data and attempt to commit. The commit coordinator > prevents both from committing and will abort the attempt that finishes last. > When using the (partition, attempt) pair to track data, the aborted task may > delete data associated with the (partition, attempt) pair. If that happens, > the data for the task that committed is also deleted as well, which is a > correctness bug. > For a concrete example, I have a data source that creates files in place > named with {{part---.}}. Because these > files are written in place, both tasks create the same file and the one that > is aborted deletes the file, leading to data corruption when the file is > added to the table. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24552) Task attempt numbers are reused when stages are retried
[ https://issues.apache.org/jira/browse/SPARK-24552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16519430#comment-16519430 ] Thomas Graves commented on SPARK-24552: --- this is actually a problem with hadoop committers, v1 and v2 > Task attempt numbers are reused when stages are retried > --- > > Key: SPARK-24552 > URL: https://issues.apache.org/jira/browse/SPARK-24552 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.1, 2.2.0, 2.2.1, 2.3.0, 2.3.1 >Reporter: Ryan Blue >Priority: Blocker > > When stages are retried due to shuffle failures, task attempt numbers are > reused. This causes a correctness bug in the v2 data sources write path. > Data sources (both the original and v2) pass the task attempt to writers so > that writers can use the attempt number to track and clean up data from > failed or speculative attempts. In the v2 docs for DataWriterFactory, the > attempt number's javadoc states that "Implementations can use this attempt > number to distinguish writers of different task attempts." > When two attempts of a stage use the same (partition, attempt) pair, two > tasks can create the same data and attempt to commit. The commit coordinator > prevents both from committing and will abort the attempt that finishes last. > When using the (partition, attempt) pair to track data, the aborted task may > delete data associated with the (partition, attempt) pair. If that happens, > the data for the task that committed is also deleted as well, which is a > correctness bug. > For a concrete example, I have a data source that creates files in place > named with {{part---.}}. Because these > files are written in place, both tasks create the same file and the one that > is aborted deletes the file, leading to data corruption when the file is > added to the table. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24611) Clean up OutputCommitCoordinator
[ https://issues.apache.org/jira/browse/SPARK-24611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16519420#comment-16519420 ] Thomas Graves commented on SPARK-24611: --- [~joshrosen] just noticed you were the last one to modify the dagscheduler for output commit coordinator [https://github.com/apache/spark/commit/d0b56339625727744e2c30fc2167bc6a457d37f7] where it split ShuffleMapStage from ResultStage handling. Do you know of any case the ShuffleMapStage actually call into canCommit? > Clean up OutputCommitCoordinator > > > Key: SPARK-24611 > URL: https://issues.apache.org/jira/browse/SPARK-24611 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Marcelo Vanzin >Priority: Major > > This is a follow up to SPARK-24589, to address some issues brought up during > review of the change: > - the DAGScheduler registers all stages with the coordinator, when at first > view only result stages need to. That would save memory in the driver. > - the coordinator can track task IDs instead of the internal "TaskIdentifier" > type it uses; that would also save some memory, and also be more accurate. > - {{TaskCommitDenied}} currently has a "job ID" when it's really a stage ID, > and it contains the task attempt number, when it should probably have the > task ID instead (like above). > The latter is an API breakage (in a class tagged as developer API, but > still), and also affects data written to event logs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24622) Task attempts in other stage attempts not killed when one task attempt succeeds
[ https://issues.apache.org/jira/browse/SPARK-24622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16519416#comment-16519416 ] Thomas Graves commented on SPARK-24622: --- Need to investigate further/test to make sure I am not missing anything > Task attempts in other stage attempts not killed when one task attempt > succeeds > --- > > Key: SPARK-24622 > URL: https://issues.apache.org/jira/browse/SPARK-24622 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 2.1.0 >Reporter: Thomas Graves >Priority: Major > > Looking through the code handling for > [https://github.com/apache/spark/pull/21577,] I was looking to see how we are > killing task attempts. I don't any where that we actually kill task attempts > for stage attempts not in the one that completed successfully. > > For instance: > stage 0.0 . (stage id 0, attempt 0) > - task 1.0 (task 1, attempt 0) > Stage 0.1 (stage id 0, attempt 1) started due to fetch failure for instance > - task 1.0 (task 1, attempt 0) . Equivalent task for stage 0.0, task 1.0 > because task 1.0 in stage 0.0 didn't finish and didn't fail. > > Now if task 1.0 in stage 0.0 succeeds, it gets committed and marked as > successful. We will mark the task in stage 0.1 as completed but there is no > where in the code that I see it actually kill task 1.0 in stage 0.1. > Note that the scheduler does handle the case where we have 2 attempts > (speculation) in a single stage attempt. It will kill the other attempt when > one of them succeeds. See TaskSetManager.handleSuccessfulTask -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24622) Task attempts in other stage attempts not killed when one task attempt succeeds
Thomas Graves created SPARK-24622: - Summary: Task attempts in other stage attempts not killed when one task attempt succeeds Key: SPARK-24622 URL: https://issues.apache.org/jira/browse/SPARK-24622 Project: Spark Issue Type: Bug Components: Scheduler Affects Versions: 2.1.0 Reporter: Thomas Graves Looking through the code handling for [https://github.com/apache/spark/pull/21577,] I was looking to see how we are killing task attempts. I don't any where that we actually kill task attempts for stage attempts not in the one that completed successfully. For instance: stage 0.0 . (stage id 0, attempt 0) - task 1.0 (task 1, attempt 0) Stage 0.1 (stage id 0, attempt 1) started due to fetch failure for instance - task 1.0 (task 1, attempt 0) . Equivalent task for stage 0.0, task 1.0 because task 1.0 in stage 0.0 didn't finish and didn't fail. Now if task 1.0 in stage 0.0 succeeds, it gets committed and marked as successful. We will mark the task in stage 0.1 as completed but there is no where in the code that I see it actually kill task 1.0 in stage 0.1. Note that the scheduler does handle the case where we have 2 attempts (speculation) in a single stage attempt. It will kill the other attempt when one of them succeeds. See TaskSetManager.handleSuccessfulTask -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24519) MapStatus has 2000 hardcoded
[ https://issues.apache.org/jira/browse/SPARK-24519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-24519: -- Description: MapStatus uses hardcoded value of 2000 partitions to determine if it should use highly compressed map status. We should make it configurable to allow users to more easily tune their jobs with respect to this without having for them to modify their code to change the number of partitions. Note we can leave this as an internal/undocumented config for now until we have more advise for the users on how to set this config. Some of my reasoning: The config gives you a way to easily change something without the user having to change code, redeploy jar, and then run again. You can simply change the config and rerun. It also allows for easier experimentation. Changing the # of partitions has other side affects, whether good or bad is situation dependent. It can be worse are you could be increasing # of output files when you don't want to be, affects the # of tasks needs and thus executors to run in parallel, etc. There have been various talks about this number at spark summits where people have told customers to increase it to be 2001 partitions. Note if you just do a search for spark 2000 partitions you will fine various things all talking about this number. This shows that people are modifying their code to take this into account so it seems to me having this configurable would be better. Once we have more advice for users we could expose this and document information on it. was:MapStatus uses hardcoded value of 2000 partitions to determine if it should use highly compressed map status. We should make it configurable. > MapStatus has 2000 hardcoded > > > Key: SPARK-24519 > URL: https://issues.apache.org/jira/browse/SPARK-24519 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Hieu Tri Huynh >Priority: Minor > > MapStatus uses hardcoded value of 2000 partitions to determine if it should > use highly compressed map status. We should make it configurable to allow > users to more easily tune their jobs with respect to this without having for > them to modify their code to change the number of partitions. Note we can > leave this as an internal/undocumented config for now until we have more > advise for the users on how to set this config. > Some of my reasoning: > The config gives you a way to easily change something without the user having > to change code, redeploy jar, and then run again. You can simply change the > config and rerun. It also allows for easier experimentation. Changing the # > of partitions has other side affects, whether good or bad is situation > dependent. It can be worse are you could be increasing # of output files when > you don't want to be, affects the # of tasks needs and thus executors to run > in parallel, etc. > There have been various talks about this number at spark summits where people > have told customers to increase it to be 2001 partitions. Note if you just do > a search for spark 2000 partitions you will fine various things all talking > about this number. This shows that people are modifying their code to take > this into account so it seems to me having this configurable would be better. > Once we have more advice for users we could expose this and document > information on it. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24552) Task attempt numbers are reused when stages are retried
[ https://issues.apache.org/jira/browse/SPARK-24552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-24552: -- Affects Version/s: 2.2.0 2.2.1 2.3.0 2.3.1 > Task attempt numbers are reused when stages are retried > --- > > Key: SPARK-24552 > URL: https://issues.apache.org/jira/browse/SPARK-24552 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.1, 2.2.0, 2.2.1, 2.3.0, 2.3.1 >Reporter: Ryan Blue >Priority: Blocker > > When stages are retried due to shuffle failures, task attempt numbers are > reused. This causes a correctness bug in the v2 data sources write path. > Data sources (both the original and v2) pass the task attempt to writers so > that writers can use the attempt number to track and clean up data from > failed or speculative attempts. In the v2 docs for DataWriterFactory, the > attempt number's javadoc states that "Implementations can use this attempt > number to distinguish writers of different task attempts." > When two attempts of a stage use the same (partition, attempt) pair, two > tasks can create the same data and attempt to commit. The commit coordinator > prevents both from committing and will abort the attempt that finishes last. > When using the (partition, attempt) pair to track data, the aborted task may > delete data associated with the (partition, attempt) pair. If that happens, > the data for the task that committed is also deleted as well, which is a > correctness bug. > For a concrete example, I have a data source that creates files in place > named with {{part---.}}. Because these > files are written in place, both tasks create the same file and the one that > is aborted deletes the file, leading to data corruption when the file is > added to the table. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24552) Task attempt numbers are reused when stages are retried
[ https://issues.apache.org/jira/browse/SPARK-24552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-24552: -- Priority: Blocker (was: Critical) > Task attempt numbers are reused when stages are retried > --- > > Key: SPARK-24552 > URL: https://issues.apache.org/jira/browse/SPARK-24552 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.1, 2.2.0, 2.2.1, 2.3.0, 2.3.1 >Reporter: Ryan Blue >Priority: Blocker > > When stages are retried due to shuffle failures, task attempt numbers are > reused. This causes a correctness bug in the v2 data sources write path. > Data sources (both the original and v2) pass the task attempt to writers so > that writers can use the attempt number to track and clean up data from > failed or speculative attempts. In the v2 docs for DataWriterFactory, the > attempt number's javadoc states that "Implementations can use this attempt > number to distinguish writers of different task attempts." > When two attempts of a stage use the same (partition, attempt) pair, two > tasks can create the same data and attempt to commit. The commit coordinator > prevents both from committing and will abort the attempt that finishes last. > When using the (partition, attempt) pair to track data, the aborted task may > delete data associated with the (partition, attempt) pair. If that happens, > the data for the task that committed is also deleted as well, which is a > correctness bug. > For a concrete example, I have a data source that creates files in place > named with {{part---.}}. Because these > files are written in place, both tasks create the same file and the one that > is aborted deletes the file, leading to data corruption when the file is > added to the table. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22148) TaskSetManager.abortIfCompletelyBlacklisted should not abort when all current executors are blacklisted but dynamic allocation is enabled
[ https://issues.apache.org/jira/browse/SPARK-22148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16512691#comment-16512691 ] Thomas Graves commented on SPARK-22148: --- ok, just update if you start working on it. thanks. > TaskSetManager.abortIfCompletelyBlacklisted should not abort when all current > executors are blacklisted but dynamic allocation is enabled > - > > Key: SPARK-22148 > URL: https://issues.apache.org/jira/browse/SPARK-22148 > Project: Spark > Issue Type: Bug > Components: Scheduler, Spark Core >Affects Versions: 2.2.0 >Reporter: Juan Rodríguez Hortalá >Priority: Major > Attachments: SPARK-22148_WIP.diff > > > Currently TaskSetManager.abortIfCompletelyBlacklisted aborts the TaskSet and > the whole Spark job with `task X (partition Y) cannot run anywhere due to > node and executor blacklist. Blacklisting behavior can be configured via > spark.blacklist.*.` when all the available executors are blacklisted for a > pending Task or TaskSet. This makes sense for static allocation, where the > set of executors is fixed for the duration of the application, but this might > lead to unnecessary job failures when dynamic allocation is enabled. For > example, in a Spark application with a single job at a time, when a node > fails at the end of a stage attempt, all other executors will complete their > tasks, but the tasks running in the executors of the failing node will be > pending. Spark will keep waiting for those tasks for 2 minutes by default > (spark.network.timeout) until the heartbeat timeout is triggered, and then it > will blacklist those executors for that stage. At that point in time, other > executors would had been released after being idle for 1 minute by default > (spark.dynamicAllocation.executorIdleTimeout), because the next stage hasn't > started yet and so there are no more tasks available (assuming the default of > spark.speculation = false). So Spark will fail because the only executors > available are blacklisted for that stage. > An alternative is requesting more executors to the cluster manager in this > situation. This could be retried a configurable number of times after a > configurable wait time between request attempts, so if the cluster manager > fails to provide a suitable executor then the job is aborted like in the > previous case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24539) HistoryServer does not display metrics from tasks that complete after stage failure
[ https://issues.apache.org/jira/browse/SPARK-24539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16512643#comment-16512643 ] Thomas Graves commented on SPARK-24539: --- Its possible, I thought when I checked the history server I was actually seeing them aggregated properly but I don't know if I checked the specific task events. Probably all related. > HistoryServer does not display metrics from tasks that complete after stage > failure > --- > > Key: SPARK-24539 > URL: https://issues.apache.org/jira/browse/SPARK-24539 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.1 >Reporter: Imran Rashid >Priority: Major > > I noticed that task metrics for completed tasks with a stage failure do not > show up in the new history server. I have a feeling this is because all of > the tasks succeeded *after* the stage had been failed (so they were > completions from a "zombie" taskset). The task metrics (eg. the shuffle read > size & shuffle write size) do not show up at all, either in the task table, > the executor table, or the overall stage summary metrics. (they might not > show up in the job summary page either, but in the event logs I have, there > is another successful stage attempt after this one, and that is the only > thing which shows up in the jobs page.) If you get task details from the api > endpoint (eg. > http://[host]:[port]/api/v1/applications/[app-id]/stages/[stage-id]/[stage-attempt]) > then you can see the successful tasks and all the metrics > Unfortunately the event logs I have are huge and I don't have a small repro > handy, but I hope that description is enough to go on. > I loaded the event logs I have in the SHS from spark 2.2 and they appear fine. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24552) Task attempt numbers are reused when stages are retried
[ https://issues.apache.org/jira/browse/SPARK-24552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16512519#comment-16512519 ] Thomas Graves commented on SPARK-24552: --- sorry just realized the v2 api is still marked experiment so downgrading to critical > Task attempt numbers are reused when stages are retried > --- > > Key: SPARK-24552 > URL: https://issues.apache.org/jira/browse/SPARK-24552 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.1 >Reporter: Ryan Blue >Priority: Critical > > When stages are retried due to shuffle failures, task attempt numbers are > reused. This causes a correctness bug in the v2 data sources write path. > Data sources (both the original and v2) pass the task attempt to writers so > that writers can use the attempt number to track and clean up data from > failed or speculative attempts. In the v2 docs for DataWriterFactory, the > attempt number's javadoc states that "Implementations can use this attempt > number to distinguish writers of different task attempts." > When two attempts of a stage use the same (partition, attempt) pair, two > tasks can create the same data and attempt to commit. The commit coordinator > prevents both from committing and will abort the attempt that finishes last. > When using the (partition, attempt) pair to track data, the aborted task may > delete data associated with the (partition, attempt) pair. If that happens, > the data for the task that committed is also deleted as well, which is a > correctness bug. > For a concrete example, I have a data source that creates files in place > named with {{part---.}}. Because these > files are written in place, both tasks create the same file and the one that > is aborted deletes the file, leading to data corruption when the file is > added to the table. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24552) Task attempt numbers are reused when stages are retried
[ https://issues.apache.org/jira/browse/SPARK-24552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-24552: -- Priority: Critical (was: Blocker) > Task attempt numbers are reused when stages are retried > --- > > Key: SPARK-24552 > URL: https://issues.apache.org/jira/browse/SPARK-24552 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.1 >Reporter: Ryan Blue >Priority: Critical > > When stages are retried due to shuffle failures, task attempt numbers are > reused. This causes a correctness bug in the v2 data sources write path. > Data sources (both the original and v2) pass the task attempt to writers so > that writers can use the attempt number to track and clean up data from > failed or speculative attempts. In the v2 docs for DataWriterFactory, the > attempt number's javadoc states that "Implementations can use this attempt > number to distinguish writers of different task attempts." > When two attempts of a stage use the same (partition, attempt) pair, two > tasks can create the same data and attempt to commit. The commit coordinator > prevents both from committing and will abort the attempt that finishes last. > When using the (partition, attempt) pair to track data, the aborted task may > delete data associated with the (partition, attempt) pair. If that happens, > the data for the task that committed is also deleted as well, which is a > correctness bug. > For a concrete example, I have a data source that creates files in place > named with {{part---.}}. Because these > files are written in place, both tasks create the same file and the one that > is aborted deletes the file, leading to data corruption when the file is > added to the table. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24552) Task attempt numbers are reused when stages are retried
[ https://issues.apache.org/jira/browse/SPARK-24552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16512504#comment-16512504 ] Thomas Graves commented on SPARK-24552: --- Note if this is a correctness bug and can cause data loss/corruption it needs to be a blocker, changed to blocker, if I am misunderstanding please update. > Task attempt numbers are reused when stages are retried > --- > > Key: SPARK-24552 > URL: https://issues.apache.org/jira/browse/SPARK-24552 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.1 >Reporter: Ryan Blue >Priority: Blocker > > When stages are retried due to shuffle failures, task attempt numbers are > reused. This causes a correctness bug in the v2 data sources write path. > Data sources (both the original and v2) pass the task attempt to writers so > that writers can use the attempt number to track and clean up data from > failed or speculative attempts. In the v2 docs for DataWriterFactory, the > attempt number's javadoc states that "Implementations can use this attempt > number to distinguish writers of different task attempts." > When two attempts of a stage use the same (partition, attempt) pair, two > tasks can create the same data and attempt to commit. The commit coordinator > prevents both from committing and will abort the attempt that finishes last. > When using the (partition, attempt) pair to track data, the aborted task may > delete data associated with the (partition, attempt) pair. If that happens, > the data for the task that committed is also deleted as well, which is a > correctness bug. > For a concrete example, I have a data source that creates files in place > named with {{part---.}}. Because these > files are written in place, both tasks create the same file and the one that > is aborted deletes the file, leading to data corruption when the file is > added to the table. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24552) Task attempt numbers are reused when stages are retried
[ https://issues.apache.org/jira/browse/SPARK-24552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-24552: -- Priority: Blocker (was: Major) > Task attempt numbers are reused when stages are retried > --- > > Key: SPARK-24552 > URL: https://issues.apache.org/jira/browse/SPARK-24552 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.1 >Reporter: Ryan Blue >Priority: Blocker > > When stages are retried due to shuffle failures, task attempt numbers are > reused. This causes a correctness bug in the v2 data sources write path. > Data sources (both the original and v2) pass the task attempt to writers so > that writers can use the attempt number to track and clean up data from > failed or speculative attempts. In the v2 docs for DataWriterFactory, the > attempt number's javadoc states that "Implementations can use this attempt > number to distinguish writers of different task attempts." > When two attempts of a stage use the same (partition, attempt) pair, two > tasks can create the same data and attempt to commit. The commit coordinator > prevents both from committing and will abort the attempt that finishes last. > When using the (partition, attempt) pair to track data, the aborted task may > delete data associated with the (partition, attempt) pair. If that happens, > the data for the task that committed is also deleted as well, which is a > correctness bug. > For a concrete example, I have a data source that creates files in place > named with {{part---.}}. Because these > files are written in place, both tasks create the same file and the one that > is aborted deletes the file, leading to data corruption when the file is > added to the table. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24552) Task attempt numbers are reused when stages are retried
[ https://issues.apache.org/jira/browse/SPARK-24552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16512500#comment-16512500 ] Thomas Graves commented on SPARK-24552: --- I agree, I don't think changing the attempt number at this point does much help and could cause confusion. I would rather see something like this change if we do major reworking of the scheduler. > Task attempt numbers are reused when stages are retried > --- > > Key: SPARK-24552 > URL: https://issues.apache.org/jira/browse/SPARK-24552 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.1 >Reporter: Ryan Blue >Priority: Major > > When stages are retried due to shuffle failures, task attempt numbers are > reused. This causes a correctness bug in the v2 data sources write path. > Data sources (both the original and v2) pass the task attempt to writers so > that writers can use the attempt number to track and clean up data from > failed or speculative attempts. In the v2 docs for DataWriterFactory, the > attempt number's javadoc states that "Implementations can use this attempt > number to distinguish writers of different task attempts." > When two attempts of a stage use the same (partition, attempt) pair, two > tasks can create the same data and attempt to commit. The commit coordinator > prevents both from committing and will abort the attempt that finishes last. > When using the (partition, attempt) pair to track data, the aborted task may > delete data associated with the (partition, attempt) pair. If that happens, > the data for the task that committed is also deleted as well, which is a > correctness bug. > For a concrete example, I have a data source that creates files in place > named with {{part---.}}. Because these > files are written in place, both tasks create the same file and the one that > is aborted deletes the file, leading to data corruption when the file is > added to the table. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24415) Stage page aggregated executor metrics wrong when failures
[ https://issues.apache.org/jira/browse/SPARK-24415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-24415: -- Priority: Critical (was: Major) > Stage page aggregated executor metrics wrong when failures > --- > > Key: SPARK-24415 > URL: https://issues.apache.org/jira/browse/SPARK-24415 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Priority: Critical > Attachments: Screen Shot 2018-05-29 at 2.15.38 PM.png > > > Running with spark 2.3 on yarn and having task failures and blacklisting, the > aggregated metrics by executor are not correct. In my example it should have > 2 failed tasks but it only shows one. Note I tested with master branch to > verify its not fixed. > I will attach screen shot. > To reproduce: > $SPARK_HOME/bin/spark-shell --master yarn --deploy-mode client > --executor-memory=2G --num-executors=1 --conf "spark.blacklist.enabled=true" > --conf "spark.blacklist.stage.maxFailedTasksPerExecutor=1" --conf > "spark.blacklist.stage.maxFailedExecutorsPerNode=1" --conf > "spark.blacklist.application.maxFailedTasksPerExecutor=2" --conf > "spark.blacklist.killBlacklistedExecutors=true" > import org.apache.spark.SparkEnv > sc.parallelize(1 to 1, 10).map \{ x => if (SparkEnv.get.executorId.toInt > >= 1 && SparkEnv.get.executorId.toInt <= 4) throw new RuntimeException("Bad > executor") else (x % 3, x) }.reduceByKey((a, b) => a + b).collect() -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24415) Stage page aggregated executor metrics wrong when failures
[ https://issues.apache.org/jira/browse/SPARK-24415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495224#comment-16495224 ] Thomas Graves commented on SPARK-24415: --- ok so the issue here is in the AppStatusListener where its only updating the task metrics for liveStages. It gets the second taskEnd event after it cancelled stage 2 so its no longer in the live stages array. > Stage page aggregated executor metrics wrong when failures > --- > > Key: SPARK-24415 > URL: https://issues.apache.org/jira/browse/SPARK-24415 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Priority: Major > Attachments: Screen Shot 2018-05-29 at 2.15.38 PM.png > > > Running with spark 2.3 on yarn and having task failures and blacklisting, the > aggregated metrics by executor are not correct. In my example it should have > 2 failed tasks but it only shows one. Note I tested with master branch to > verify its not fixed. > I will attach screen shot. > To reproduce: > $SPARK_HOME/bin/spark-shell --master yarn --deploy-mode client > --executor-memory=2G --num-executors=1 --conf "spark.blacklist.enabled=true" > --conf "spark.blacklist.stage.maxFailedTasksPerExecutor=1" --conf > "spark.blacklist.stage.maxFailedExecutorsPerNode=1" --conf > "spark.blacklist.application.maxFailedTasksPerExecutor=2" --conf > "spark.blacklist.killBlacklistedExecutors=true" > import org.apache.spark.SparkEnv > sc.parallelize(1 to 1, 10).map \{ x => if (SparkEnv.get.executorId.toInt > >= 1 && SparkEnv.get.executorId.toInt <= 4) throw new RuntimeException("Bad > executor") else (x % 3, x) }.reduceByKey((a, b) => a + b).collect() -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24415) Stage page aggregated executor metrics wrong when failures
[ https://issues.apache.org/jira/browse/SPARK-24415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495204#comment-16495204 ] Thomas Graves commented on SPARK-24415: --- It also looks like in the history server they show up properly in the aggregated metrics, although if you look at the Tasks (for all stages) column on the jobs page, it only lists a single task failure where it should list 2. > Stage page aggregated executor metrics wrong when failures > --- > > Key: SPARK-24415 > URL: https://issues.apache.org/jira/browse/SPARK-24415 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Priority: Major > Attachments: Screen Shot 2018-05-29 at 2.15.38 PM.png > > > Running with spark 2.3 on yarn and having task failures and blacklisting, the > aggregated metrics by executor are not correct. In my example it should have > 2 failed tasks but it only shows one. Note I tested with master branch to > verify its not fixed. > I will attach screen shot. > To reproduce: > $SPARK_HOME/bin/spark-shell --master yarn --deploy-mode client > --executor-memory=2G --num-executors=1 --conf "spark.blacklist.enabled=true" > --conf "spark.blacklist.stage.maxFailedTasksPerExecutor=1" --conf > "spark.blacklist.stage.maxFailedExecutorsPerNode=1" --conf > "spark.blacklist.application.maxFailedTasksPerExecutor=2" --conf > "spark.blacklist.killBlacklistedExecutors=true" > import org.apache.spark.SparkEnv > sc.parallelize(1 to 1, 10).map \{ x => if (SparkEnv.get.executorId.toInt > >= 1 && SparkEnv.get.executorId.toInt <= 4) throw new RuntimeException("Bad > executor") else (x % 3, x) }.reduceByKey((a, b) => a + b).collect() -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24415) Stage page aggregated executor metrics wrong when failures
[ https://issues.apache.org/jira/browse/SPARK-24415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495200#comment-16495200 ] Thomas Graves commented on SPARK-24415: --- this might actually be an order of events type thing. You will note that the config I have is stage.maxFailedTasksPerExecutor=1 so it should really only have 1 failed task, but looking at the log it seems it starts the second task before totally handling the blacklist from the first failure: 18/05/30 13:57:20 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 2, gsrd259n13.red.ygrid.yahoo.com, executor 2, partition 0, PROCESS_LOCAL, 7746 bytes) [Stage 2:> (0 + 1) / 10]18/05/30 13:57:20 INFO BlockManagerMasterEndpoint: Registering block manager gsrd259n13.red.ygrid.yahoo.com:43203 with 912.3 MB RAM, BlockManagerId(2, gsrd259n13.red.ygrid.yahoo.com, 43203, None) 18/05/30 13:57:21 INFO Client: Application report for application_1526529576371_25524 (state: RUNNING) 18/05/30 13:57:21 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on gsrd259n13.red.ygrid.yahoo.com:43203 (size: 1941.0 B, free: 912.3 MB) 18/05/30 13:57:21 INFO TaskSetManager: Starting task 1.0 in stage 2.0 (TID 3, gsrd259n13.red.ygrid.yahoo.com, executor 2, partition 1, PROCESS_LOCAL, 7747 bytes) 18/05/30 13:57:21 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2, gsrd259n13.red.ygrid.yahoo.com, executor 2): java.lang.RuntimeException: Bad executor 18/05/30 13:57:21 INFO TaskSetBlacklist: Blacklisting executor 2 for stage 2 18/05/30 13:57:21 INFO YarnScheduler: Cancelling stage 2 18/05/30 13:57:21 INFO YarnScheduler: Stage 2 was cancelled 18/05/30 13:57:21 INFO DAGScheduler: ShuffleMapStage 2 (map at :26) failed in 12.063 s due to Job aborted due to stage failure: 18/05/30 13:57:21 INFO DAGScheduler: Job 1 failed: collect at :26, took 12.069052 s The thing is though that the executors page shows that it had 2 task failures on that node, its just in the aggregated metrics for that stage that doesn't have it. > Stage page aggregated executor metrics wrong when failures > --- > > Key: SPARK-24415 > URL: https://issues.apache.org/jira/browse/SPARK-24415 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Priority: Major > Attachments: Screen Shot 2018-05-29 at 2.15.38 PM.png > > > Running with spark 2.3 on yarn and having task failures and blacklisting, the > aggregated metrics by executor are not correct. In my example it should have > 2 failed tasks but it only shows one. Note I tested with master branch to > verify its not fixed. > I will attach screen shot. > To reproduce: > $SPARK_HOME/bin/spark-shell --master yarn --deploy-mode client > --executor-memory=2G --num-executors=1 --conf "spark.blacklist.enabled=true" > --conf "spark.blacklist.stage.maxFailedTasksPerExecutor=1" --conf > "spark.blacklist.stage.maxFailedExecutorsPerNode=1" --conf > "spark.blacklist.application.maxFailedTasksPerExecutor=2" --conf > "spark.blacklist.killBlacklistedExecutors=true" > import org.apache.spark.SparkEnv > sc.parallelize(1 to 1, 10).map \{ x => if (SparkEnv.get.executorId.toInt > >= 1 && SparkEnv.get.executorId.toInt <= 4) throw new RuntimeException("Bad > executor") else (x % 3, x) }.reduceByKey((a, b) => a + b).collect() -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24415) Stage page aggregated executor metrics wrong when failures
[ https://issues.apache.org/jira/browse/SPARK-24415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-24415: -- Description: Running with spark 2.3 on yarn and having task failures and blacklisting, the aggregated metrics by executor are not correct. In my example it should have 2 failed tasks but it only shows one. Note I tested with master branch to verify its not fixed. I will attach screen shot. To reproduce: $SPARK_HOME/bin/spark-shell --master yarn --deploy-mode client --executor-memory=2G --num-executors=1 --conf "spark.blacklist.enabled=true" --conf "spark.blacklist.stage.maxFailedTasksPerExecutor=1" --conf "spark.blacklist.stage.maxFailedExecutorsPerNode=1" --conf "spark.blacklist.application.maxFailedTasksPerExecutor=2" --conf "spark.blacklist.killBlacklistedExecutors=true" import org.apache.spark.SparkEnv sc.parallelize(1 to 1, 10).map \{ x => if (SparkEnv.get.executorId.toInt >= 1 && SparkEnv.get.executorId.toInt <= 4) throw new RuntimeException("Bad executor") else (x % 3, x) }.reduceByKey((a, b) => a + b).collect() was: Running with spark 2.3 on yarn and having task failures and blacklisting, the aggregated metrics by executor are not correct. In my example it should have 2 failed tasks but it only shows one. Note I tested with master branch to verify its not fixed. I will attach screen shot. To reproduce: $SPARK_HOME/bin/spark-shell --master yarn --deploy-mode client --executor-memory=2G --num-executors=1 --conf "spark.blacklist.enabled=true" --conf "spark.blacklist.stage.maxFailedTasksPerExecutor=1" --conf "spark.blacklist.stage.maxFailedExecutorsPerNode=1" --conf "spark.blacklist.application.maxFailedTasksPerExecutor=2" --conf "spark.blacklist.killBlacklistedExecutors=true" sc.parallelize(1 to 1, 10).map \{ x => if (SparkEnv.get.executorId.toInt >= 1 && SparkEnv.get.executorId.toInt <= 4) throw new RuntimeException("Bad executor") else (x % 3, x) }.reduceByKey((a, b) => a + b).collect() > Stage page aggregated executor metrics wrong when failures > --- > > Key: SPARK-24415 > URL: https://issues.apache.org/jira/browse/SPARK-24415 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Priority: Major > Attachments: Screen Shot 2018-05-29 at 2.15.38 PM.png > > > Running with spark 2.3 on yarn and having task failures and blacklisting, the > aggregated metrics by executor are not correct. In my example it should have > 2 failed tasks but it only shows one. Note I tested with master branch to > verify its not fixed. > I will attach screen shot. > To reproduce: > $SPARK_HOME/bin/spark-shell --master yarn --deploy-mode client > --executor-memory=2G --num-executors=1 --conf "spark.blacklist.enabled=true" > --conf "spark.blacklist.stage.maxFailedTasksPerExecutor=1" --conf > "spark.blacklist.stage.maxFailedExecutorsPerNode=1" --conf > "spark.blacklist.application.maxFailedTasksPerExecutor=2" --conf > "spark.blacklist.killBlacklistedExecutors=true" > import org.apache.spark.SparkEnv > sc.parallelize(1 to 1, 10).map \{ x => if (SparkEnv.get.executorId.toInt > >= 1 && SparkEnv.get.executorId.toInt <= 4) throw new RuntimeException("Bad > executor") else (x % 3, x) }.reduceByKey((a, b) => a + b).collect() -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24415) Stage page aggregated executor metrics wrong when failures
[ https://issues.apache.org/jira/browse/SPARK-24415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-24415: -- Description: Running with spark 2.3 on yarn and having task failures and blacklisting, the aggregated metrics by executor are not correct. In my example it should have 2 failed tasks but it only shows one. Note I tested with master branch to verify its not fixed. I will attach screen shot. To reproduce: $SPARK_HOME/bin/spark-shell --master yarn --deploy-mode client --executor-memory=2G --num-executors=1 --conf "spark.blacklist.enabled=true" --conf "spark.blacklist.stage.maxFailedTasksPerExecutor=1" --conf "spark.blacklist.stage.maxFailedExecutorsPerNode=1" --conf "spark.blacklist.application.maxFailedTasksPerExecutor=2" --conf "spark.blacklist.killBlacklistedExecutors=true" sc.parallelize(1 to 1, 10).map \{ x => if (SparkEnv.get.executorId.toInt >= 1 && SparkEnv.get.executorId.toInt <= 4) throw new RuntimeException("Bad executor") else (x % 3, x) }.reduceByKey((a, b) => a + b).collect() was: Running with spark 2.3 on yarn and having task failures and blacklisting, the aggregated metrics by executor are not correct. In my example it should have 2 failed tasks but it only shows one. Note I tested with master branch to verify its not fixed. I will attach screen shot. To reproduce: $SPARK_HOME/bin/spark-shell --master yarn --deploy-mode client --executor-memory=2G --num-executors=1 --conf "spark.blacklist.enabled=true" --conf "spark.blacklist.stage.maxFailedTasksPerExecutor=1" --conf "spark.blacklist.stage.maxFailedExecutorsPerNode=1" --conf "spark.blacklist.application.maxFailedTasksPerExecutor=2" --conf "spark.blacklist.killBlacklistedExecutors=true" sc.parallelize(1 to 1, 10).map\{ x => | if (SparkEnv.get.executorId.toInt >= 1 && SparkEnv.get.executorId.toInt <= 4) throw new RuntimeException("Bad executor") | else (x % 3, x) | }.reduceByKey((a, b) => a + b).collect() > Stage page aggregated executor metrics wrong when failures > --- > > Key: SPARK-24415 > URL: https://issues.apache.org/jira/browse/SPARK-24415 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Priority: Major > Attachments: Screen Shot 2018-05-29 at 2.15.38 PM.png > > > Running with spark 2.3 on yarn and having task failures and blacklisting, the > aggregated metrics by executor are not correct. In my example it should have > 2 failed tasks but it only shows one. Note I tested with master branch to > verify its not fixed. > I will attach screen shot. > To reproduce: > $SPARK_HOME/bin/spark-shell --master yarn --deploy-mode client > --executor-memory=2G --num-executors=1 --conf "spark.blacklist.enabled=true" > --conf "spark.blacklist.stage.maxFailedTasksPerExecutor=1" --conf > "spark.blacklist.stage.maxFailedExecutorsPerNode=1" --conf > "spark.blacklist.application.maxFailedTasksPerExecutor=2" --conf > "spark.blacklist.killBlacklistedExecutors=true" > > > sc.parallelize(1 to 1, 10).map \{ x => if (SparkEnv.get.executorId.toInt > >= 1 && SparkEnv.get.executorId.toInt <= 4) throw new RuntimeException("Bad > executor") else (x % 3, x) }.reduceByKey((a, b) => a + b).collect() -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24415) Stage page aggregated executor metrics wrong when failures
[ https://issues.apache.org/jira/browse/SPARK-24415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-24415: -- Description: Running with spark 2.3 on yarn and having task failures and blacklisting, the aggregated metrics by executor are not correct. In my example it should have 2 failed tasks but it only shows one. Note I tested with master branch to verify its not fixed. I will attach screen shot. To reproduce: $SPARK_HOME/bin/spark-shell --master yarn --deploy-mode client --executor-memory=2G --num-executors=1 --conf "spark.blacklist.enabled=true" --conf "spark.blacklist.stage.maxFailedTasksPerExecutor=1" --conf "spark.blacklist.stage.maxFailedExecutorsPerNode=1" --conf "spark.blacklist.application.maxFailedTasksPerExecutor=2" --conf "spark.blacklist.killBlacklistedExecutors=true" sc.parallelize(1 to 1, 10).map\{ x => | if (SparkEnv.get.executorId.toInt >= 1 && SparkEnv.get.executorId.toInt <= 4) throw new RuntimeException("Bad executor") | else (x % 3, x) | }.reduceByKey((a, b) => a + b).collect() was: Running with spark 2.3 on yarn and having task failures and blacklisting, the aggregated metrics by executor are not correct. In my example it should have 2 failed tasks but it only shows one. Note I tested with master branch to verify its not fixed. I will attach screen shot. To reproduce: $SPARK_HOME/bin/spark-shell --master yarn --deploy-mode client --executor-memory=2G --num-executors=1 --conf "spark.blacklist.enabled=true" --conf "spark.blacklist.stage.maxFailedTasksPerExecutor=1" --conf "spark.blacklist.stage.maxFailedExecutorsPerNode=1" --conf "spark.blacklist.application.maxFailedTasksPerExecutor=2" --conf "spark.blacklist.killBlacklistedExecutors=true" sc.parallelize(1 to 1, 10).map { x => | if (SparkEnv.get.executorId.toInt >= 1 && SparkEnv.get.executorId.toInt <= 4) throw new RuntimeException("Bad executor") | else (x % 3, x) | } .reduceByKey((a, b) => a + b).collect() > Stage page aggregated executor metrics wrong when failures > --- > > Key: SPARK-24415 > URL: https://issues.apache.org/jira/browse/SPARK-24415 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Priority: Major > Attachments: Screen Shot 2018-05-29 at 2.15.38 PM.png > > > Running with spark 2.3 on yarn and having task failures and blacklisting, the > aggregated metrics by executor are not correct. In my example it should have > 2 failed tasks but it only shows one. Note I tested with master branch to > verify its not fixed. > I will attach screen shot. > To reproduce: > $SPARK_HOME/bin/spark-shell --master yarn --deploy-mode client > --executor-memory=2G --num-executors=1 --conf "spark.blacklist.enabled=true" > --conf "spark.blacklist.stage.maxFailedTasksPerExecutor=1" --conf > "spark.blacklist.stage.maxFailedExecutorsPerNode=1" --conf > "spark.blacklist.application.maxFailedTasksPerExecutor=2" --conf > "spark.blacklist.killBlacklistedExecutors=true" > > sc.parallelize(1 to 1, 10).map\{ x => | if (SparkEnv.get.executorId.toInt > >= 1 && SparkEnv.get.executorId.toInt <= 4) throw new RuntimeException("Bad > executor") | else (x % 3, x) | }.reduceByKey((a, b) => a + b).collect() -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-24413) Executor Blacklisting shouldn't immediately fail the application if dynamic allocation is enabled and no active executors
[ https://issues.apache.org/jira/browse/SPARK-24413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-24413. --- Resolution: Duplicate > Executor Blacklisting shouldn't immediately fail the application if dynamic > allocation is enabled and no active executors > - > > Key: SPARK-24413 > URL: https://issues.apache.org/jira/browse/SPARK-24413 > Project: Spark > Issue Type: Improvement > Components: Scheduler >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Priority: Major > > Currently with executor blacklisting enabled, dynamic allocation on, and you > only have 1 active executor (spark.blacklist.killBlacklistedExecutors setting > doesn't matter in this case, can be on or off), if you have a task fail that > results in the 1 executor you have getting blacklisted, then your entire > application will fail. The error you get is something like: > Aborting TaskSet 0.0 because task 9 (partition 9) > cannot run anywhere due to node and executor blacklist. > This is very undesirable behavior because you may have a huge job but one > task is the long tail and if it happens to hit a bad node that would > blacklist it, the entire job fail. > Ideally since dynamic allocation is on, the schedule should not immediately > fail but it should let dynamic allocation try to get more executors. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24413) Executor Blacklisting shouldn't immediately fail the application if dynamic allocation is enabled and no active executors
[ https://issues.apache.org/jira/browse/SPARK-24413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16494279#comment-16494279 ] Thomas Graves commented on SPARK-24413: --- thanks for linking those we can just dup this to SPARK-22148 > Executor Blacklisting shouldn't immediately fail the application if dynamic > allocation is enabled and no active executors > - > > Key: SPARK-24413 > URL: https://issues.apache.org/jira/browse/SPARK-24413 > Project: Spark > Issue Type: Improvement > Components: Scheduler >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Priority: Major > > Currently with executor blacklisting enabled, dynamic allocation on, and you > only have 1 active executor (spark.blacklist.killBlacklistedExecutors setting > doesn't matter in this case, can be on or off), if you have a task fail that > results in the 1 executor you have getting blacklisted, then your entire > application will fail. The error you get is something like: > Aborting TaskSet 0.0 because task 9 (partition 9) > cannot run anywhere due to node and executor blacklist. > This is very undesirable behavior because you may have a huge job but one > task is the long tail and if it happens to hit a bad node that would > blacklist it, the entire job fail. > Ideally since dynamic allocation is on, the schedule should not immediately > fail but it should let dynamic allocation try to get more executors. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24414) Stages page doesn't show all task attempts when failures
[ https://issues.apache.org/jira/browse/SPARK-24414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16494265#comment-16494265 ] Thomas Graves commented on SPARK-24414: --- also just an fyi I also filed SPARK-24415, not sure if they are related as I haven't dug into that one yet. > Stages page doesn't show all task attempts when failures > > > Key: SPARK-24414 > URL: https://issues.apache.org/jira/browse/SPARK-24414 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Priority: Critical > > If you have task failures, the StagePage doesn't render all the task attempts > properly. It seems to make the table the size of the total number of > successful tasks rather then including all the failed tasks. > Even though the table size is smaller, if you sort by various columns you can > see all the tasks are actually there, it just seems the size of the table is > wrong. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24414) Stages page doesn't show all task attempts when failures
[ https://issues.apache.org/jira/browse/SPARK-24414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16494237#comment-16494237 ] Thomas Graves commented on SPARK-24414: --- I am looking to see if we can just return an empty table in the case the tasks aren't initialized yet. If you get to it first thats fine or had something else in mind > Stages page doesn't show all task attempts when failures > > > Key: SPARK-24414 > URL: https://issues.apache.org/jira/browse/SPARK-24414 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Priority: Critical > > If you have task failures, the StagePage doesn't render all the task attempts > properly. It seems to make the table the size of the total number of > successful tasks rather then including all the failed tasks. > Even though the table size is smaller, if you sort by various columns you can > see all the tasks are actually there, it just seems the size of the table is > wrong. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24414) Stages page doesn't show all task attempts when failures
[ https://issues.apache.org/jira/browse/SPARK-24414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16494199#comment-16494199 ] Thomas Graves commented on SPARK-24414: --- looks like this was broken by SPARK-23147, so we probably need to find a different solution. [~vanzin] [~jerryshao] > Stages page doesn't show all task attempts when failures > > > Key: SPARK-24414 > URL: https://issues.apache.org/jira/browse/SPARK-24414 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Priority: Critical > > If you have task failures, the StagePage doesn't render all the task attempts > properly. It seems to make the table the size of the total number of > successful tasks rather then including all the failed tasks. > Even though the table size is smaller, if you sort by various columns you can > see all the tasks are actually there, it just seems the size of the table is > wrong. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24415) Stage page aggregated executor metrics wrong when failures
[ https://issues.apache.org/jira/browse/SPARK-24415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-24415: -- Description: Running with spark 2.3 on yarn and having task failures and blacklisting, the aggregated metrics by executor are not correct. In my example it should have 2 failed tasks but it only shows one. Note I tested with master branch to verify its not fixed. I will attach screen shot. To reproduce: $SPARK_HOME/bin/spark-shell --master yarn --deploy-mode client --executor-memory=2G --num-executors=1 --conf "spark.blacklist.enabled=true" --conf "spark.blacklist.stage.maxFailedTasksPerExecutor=1" --conf "spark.blacklist.stage.maxFailedExecutorsPerNode=1" --conf "spark.blacklist.application.maxFailedTasksPerExecutor=2" --conf "spark.blacklist.killBlacklistedExecutors=true" sc.parallelize(1 to 1, 10).map { x => | if (SparkEnv.get.executorId.toInt >= 1 && SparkEnv.get.executorId.toInt <= 4) throw new RuntimeException("Bad executor") | else (x % 3, x) | } .reduceByKey((a, b) => a + b).collect() was: Running with spark 2.3 on yarn and having task failures and blacklisting, the aggregated metrics by executor are not correct. In my example it should have 2 failed tasks but it only shows one. I will attach screen shot. To reproduce: $SPARK_HOME/bin/spark-shell --master yarn --deploy-mode client --executor-memory=2G --num-executors=1 --conf "spark.blacklist.enabled=true" --conf "spark.blacklist.stage.maxFailedTasksPerExecutor=1" --conf "spark.blacklist.stage.maxFailedExecutorsPerNode=1" --conf "spark.blacklist.application.maxFailedTasksPerExecutor=2" --conf "spark.blacklist.killBlacklistedExecutors=true" sc.parallelize(1 to 1, 10).map { x => | if (SparkEnv.get.executorId.toInt >= 1 && SparkEnv.get.executorId.toInt <= 4) throw new RuntimeException("Bad executor") | else (x % 3, x) | }.reduceByKey((a, b) => a + b).collect() > Stage page aggregated executor metrics wrong when failures > --- > > Key: SPARK-24415 > URL: https://issues.apache.org/jira/browse/SPARK-24415 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Priority: Major > Attachments: Screen Shot 2018-05-29 at 2.15.38 PM.png > > > Running with spark 2.3 on yarn and having task failures and blacklisting, the > aggregated metrics by executor are not correct. In my example it should have > 2 failed tasks but it only shows one. Note I tested with master branch to > verify its not fixed. > I will attach screen shot. > To reproduce: > $SPARK_HOME/bin/spark-shell --master yarn --deploy-mode client > --executor-memory=2G --num-executors=1 --conf "spark.blacklist.enabled=true" > --conf "spark.blacklist.stage.maxFailedTasksPerExecutor=1" --conf > "spark.blacklist.stage.maxFailedExecutorsPerNode=1" --conf > "spark.blacklist.application.maxFailedTasksPerExecutor=2" --conf > "spark.blacklist.killBlacklistedExecutors=true" > > sc.parallelize(1 to 1, 10).map > { x => | if (SparkEnv.get.executorId.toInt >= 1 && > SparkEnv.get.executorId.toInt <= 4) throw new RuntimeException("Bad > executor") | else (x % 3, x) | } > .reduceByKey((a, b) => a + b).collect() -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24415) Stage page aggregated executor metrics wrong when failures
Thomas Graves created SPARK-24415: - Summary: Stage page aggregated executor metrics wrong when failures Key: SPARK-24415 URL: https://issues.apache.org/jira/browse/SPARK-24415 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 2.3.0 Reporter: Thomas Graves Attachments: Screen Shot 2018-05-29 at 2.15.38 PM.png Running with spark 2.3 on yarn and having task failures and blacklisting, the aggregated metrics by executor are not correct. In my example it should have 2 failed tasks but it only shows one. I will attach screen shot. To reproduce: $SPARK_HOME/bin/spark-shell --master yarn --deploy-mode client --executor-memory=2G --num-executors=1 --conf "spark.blacklist.enabled=true" --conf "spark.blacklist.stage.maxFailedTasksPerExecutor=1" --conf "spark.blacklist.stage.maxFailedExecutorsPerNode=1" --conf "spark.blacklist.application.maxFailedTasksPerExecutor=2" --conf "spark.blacklist.killBlacklistedExecutors=true" sc.parallelize(1 to 1, 10).map { x => | if (SparkEnv.get.executorId.toInt >= 1 && SparkEnv.get.executorId.toInt <= 4) throw new RuntimeException("Bad executor") | else (x % 3, x) | }.reduceByKey((a, b) => a + b).collect() -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24415) Stage page aggregated executor metrics wrong when failures
[ https://issues.apache.org/jira/browse/SPARK-24415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-24415: -- Attachment: Screen Shot 2018-05-29 at 2.15.38 PM.png > Stage page aggregated executor metrics wrong when failures > --- > > Key: SPARK-24415 > URL: https://issues.apache.org/jira/browse/SPARK-24415 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Priority: Major > Attachments: Screen Shot 2018-05-29 at 2.15.38 PM.png > > > Running with spark 2.3 on yarn and having task failures and blacklisting, the > aggregated metrics by executor are not correct. In my example it should have > 2 failed tasks but it only shows one. > I will attach screen shot. > To reproduce: > $SPARK_HOME/bin/spark-shell --master yarn --deploy-mode client > --executor-memory=2G --num-executors=1 --conf "spark.blacklist.enabled=true" > --conf "spark.blacklist.stage.maxFailedTasksPerExecutor=1" --conf > "spark.blacklist.stage.maxFailedExecutorsPerNode=1" --conf > "spark.blacklist.application.maxFailedTasksPerExecutor=2" --conf > "spark.blacklist.killBlacklistedExecutors=true" > > sc.parallelize(1 to 1, 10).map { x => > | if (SparkEnv.get.executorId.toInt >= 1 && SparkEnv.get.executorId.toInt <= > 4) throw new RuntimeException("Bad executor") > | else (x % 3, x) > | }.reduceByKey((a, b) => a + b).collect() -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24414) Stages page doesn't show all task attempts when failures
[ https://issues.apache.org/jira/browse/SPARK-24414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16494083#comment-16494083 ] Thomas Graves commented on SPARK-24414: --- to reproduce this simply start a shell: $SPARK_HOME/bin/spark-shell --num-executors 5 --master yarn --deploy-mode client Run something that gets some tasks failures but not all: sc.parallelize(1 to 1, 10).map { x => | if (SparkEnv.get.executorId.toInt >= 1 && SparkEnv.get.executorId.toInt <= 4) throw new RuntimeException("Bad executor") | else (x % 3, x) | }.reduceByKey((a, b) => a + b).collect() Go to the stages page and you will only see 10 tasks rendered when it should has 21 total between succeeded and failed. > Stages page doesn't show all task attempts when failures > > > Key: SPARK-24414 > URL: https://issues.apache.org/jira/browse/SPARK-24414 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Priority: Critical > > If you have task failures, the StagePage doesn't render all the task attempts > properly. It seems to make the table the size of the total number of > successful tasks rather then including all the failed tasks. > Even though the table size is smaller, if you sort by various columns you can > see all the tasks are actually there, it just seems the size of the table is > wrong. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24414) Stages page doesn't show all task attempts when failures
Thomas Graves created SPARK-24414: - Summary: Stages page doesn't show all task attempts when failures Key: SPARK-24414 URL: https://issues.apache.org/jira/browse/SPARK-24414 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 2.3.0 Reporter: Thomas Graves If you have task failures, the StagePage doesn't render all the task attempts properly. It seems to make the table the size of the total number of successful tasks rather then including all the failed tasks. Even though the table size is smaller, if you sort by various columns you can see all the tasks are actually there, it just seems the size of the table is wrong. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24413) Executor Blacklisting shouldn't immediately fail the application if dynamic allocation is enabled and no active executors
[ https://issues.apache.org/jira/browse/SPARK-24413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16493978#comment-16493978 ] Thomas Graves commented on SPARK-24413: --- [~imranr] thoughts on this? > Executor Blacklisting shouldn't immediately fail the application if dynamic > allocation is enabled and no active executors > - > > Key: SPARK-24413 > URL: https://issues.apache.org/jira/browse/SPARK-24413 > Project: Spark > Issue Type: Improvement > Components: Scheduler >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Priority: Major > > Currently with executor blacklisting enabled, dynamic allocation on, and you > only have 1 active executor (spark.blacklist.killBlacklistedExecutors setting > doesn't matter in this case, can be on or off), if you have a task fail that > results in the 1 executor you have getting blacklisted, then your entire > application will fail. The error you get is something like: > Aborting TaskSet 0.0 because task 9 (partition 9) > cannot run anywhere due to node and executor blacklist. > This is very undesirable behavior because you may have a huge job but one > task is the long tail and if it happens to hit a bad node that would > blacklist it, the entire job fail. > Ideally since dynamic allocation is on, the schedule should not immediately > fail but it should let dynamic allocation try to get more executors. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24413) Executor Blacklisting shouldn't immediately fail the application if dynamic allocation is enabled and no active executors
[ https://issues.apache.org/jira/browse/SPARK-24413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-24413: -- Summary: Executor Blacklisting shouldn't immediately fail the application if dynamic allocation is enabled and no active executors (was: Executor Blacklisting shouldn't immediately fail the application if dynamic allocation is enabled and it doesn't have any other active executors ) > Executor Blacklisting shouldn't immediately fail the application if dynamic > allocation is enabled and no active executors > - > > Key: SPARK-24413 > URL: https://issues.apache.org/jira/browse/SPARK-24413 > Project: Spark > Issue Type: Improvement > Components: Scheduler >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Priority: Major > > Currently with executor blacklisting enabled, dynamic allocation on, and you > only have 1 active executor (spark.blacklist.killBlacklistedExecutors setting > doesn't matter in this case, can be on or off), if you have a task fail that > results in the 1 executor you have getting blacklisted, then your entire > application will fail. The error you get is something like: > Aborting TaskSet 0.0 because task 9 (partition 9) > cannot run anywhere due to node and executor blacklist. > This is very undesirable behavior because you may have a huge job but one > task is the long tail and if it happens to hit a bad node that would > blacklist it, the entire job fail. > Ideally since dynamic allocation is on, the schedule should not immediately > fail but it should let dynamic allocation try to get more executors. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24413) Executor Blacklisting shouldn't immediately fail the application if dynamic allocation is enabled and it doesn't have any other active executors
Thomas Graves created SPARK-24413: - Summary: Executor Blacklisting shouldn't immediately fail the application if dynamic allocation is enabled and it doesn't have any other active executors Key: SPARK-24413 URL: https://issues.apache.org/jira/browse/SPARK-24413 Project: Spark Issue Type: Improvement Components: Scheduler Affects Versions: 2.3.0 Reporter: Thomas Graves Currently with executor blacklisting enabled, dynamic allocation on, and you only have 1 active executor (spark.blacklist.killBlacklistedExecutors setting doesn't matter in this case, can be on or off), if you have a task fail that results in the 1 executor you have getting blacklisted, then your entire application will fail. The error you get is something like: Aborting TaskSet 0.0 because task 9 (partition 9) cannot run anywhere due to node and executor blacklist. This is very undesirable behavior because you may have a huge job but one task is the long tail and if it happens to hit a bad node that would blacklist it, the entire job fail. Ideally since dynamic allocation is on, the schedule should not immediately fail but it should let dynamic allocation try to get more executors. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6235) Address various 2G limits
[ https://issues.apache.org/jira/browse/SPARK-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482746#comment-16482746 ] Thomas Graves commented on SPARK-6235: -- >> Still unsupported: * large task results * large blocks in the WAL * individual records larger than 2 GB Can you clarify what WAL is? I have seen individual records larger then 2GB, I don't think its as common though. Also can you clarify large task results? > Address various 2G limits > - > > Key: SPARK-6235 > URL: https://issues.apache.org/jira/browse/SPARK-6235 > Project: Spark > Issue Type: Umbrella > Components: Shuffle, Spark Core >Reporter: Reynold Xin >Priority: Major > Attachments: SPARK-6235_Design_V0.02.pdf > > > An umbrella ticket to track the various 2G limit we have in Spark, due to the > use of byte arrays and ByteBuffers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21033) fix the potential OOM in UnsafeExternalSorter
[ https://issues.apache.org/jira/browse/SPARK-21033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467803#comment-16467803 ] Thomas Graves commented on SPARK-21033: --- [~cloud_fan] the followup PR [https://github.com/apache/spark/pull/21077] didn't go into spark 2.3.0, this should have had its own Jira and we need to udpate the fix version. Can you please fix so we properly track what version this is in. Also does this need to be backported to 2.3.1? > fix the potential OOM in UnsafeExternalSorter > - > > Key: SPARK-21033 > URL: https://issues.apache.org/jira/browse/SPARK-21033 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Fix For: 2.3.0 > > > In `UnsafeInMemorySorter`, one record may take 32 bytes: 1 `long` for > pointer, 1 `long` for key-prefix, and another 2 `long`s as the temporary > buffer for radix sort. > In `UnsafeExternalSorter`, we set the > `DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD` to be `1024 * 1024 * 1024 / 2`, > and hoping the max size of point array to be 8 GB. However this is wrong, > `1024 * 1024 * 1024 / 2 * 32` is actually 16 GB, and if we grow the point > array before reach this limitation, we may hit the max-page-size error. > Users may see exception like this on large dataset: > {code} > Caused by: java.lang.IllegalArgumentException: Cannot allocate a page with > more than 17179869176 bytes > at > org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:241) > at > org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:121) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:374) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:396) > at > org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:94) > ... > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24124) Spark history server should create spark.history.store.path and set permissions properly
[ https://issues.apache.org/jira/browse/SPARK-24124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16458835#comment-16458835 ] Thomas Graves commented on SPARK-24124: --- [~vanzin] any objections to this? > Spark history server should create spark.history.store.path and set > permissions properly > > > Key: SPARK-24124 > URL: https://issues.apache.org/jira/browse/SPARK-24124 > Project: Spark > Issue Type: Story > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Priority: Major > > Current with the new spark history server you can set > spark.history.store.path to a location to store the levelDB files. Currently > the directory has to be made before it can use that path. > We should just have the history server create it and set the file permissions > on the leveldb files to be restrictive -> new FsPermission((short) 0700) > the shuffle service already does this, this would be much more convenient to > use and prevent people from making mistakes with the permissions on the > directory and files. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24124) Spark history server should create spark.history.store.path and set permissions properly
Thomas Graves created SPARK-24124: - Summary: Spark history server should create spark.history.store.path and set permissions properly Key: SPARK-24124 URL: https://issues.apache.org/jira/browse/SPARK-24124 Project: Spark Issue Type: Story Components: Spark Core Affects Versions: 2.3.0 Reporter: Thomas Graves Current with the new spark history server you can set spark.history.store.path to a location to store the levelDB files. Currently the directory has to be made before it can use that path. We should just have the history server create it and set the file permissions on the leveldb files to be restrictive -> new FsPermission((short) 0700) the shuffle service already does this, this would be much more convenient to use and prevent people from making mistakes with the permissions on the directory and files. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-22683) DynamicAllocation wastes resources by allocating containers that will barely be used
[ https://issues.apache.org/jira/browse/SPARK-22683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves reassigned SPARK-22683: - Assignee: Julien Cuquemelle > DynamicAllocation wastes resources by allocating containers that will barely > be used > > > Key: SPARK-22683 > URL: https://issues.apache.org/jira/browse/SPARK-22683 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.1.0, 2.2.0 >Reporter: Julien Cuquemelle >Assignee: Julien Cuquemelle >Priority: Major > Labels: pull-request-available > Fix For: 2.4.0 > > > While migrating a series of jobs from MR to Spark using dynamicAllocation, > I've noticed almost a doubling (+114% exactly) of resource consumption of > Spark w.r.t MR, for a wall clock time gain of 43% > About the context: > - resource usage stands for vcore-hours allocation for the whole job, as seen > by YARN > - I'm talking about a series of jobs because we provide our users with a way > to define experiments (via UI / DSL) that automatically get translated to > Spark / MR jobs and submitted on the cluster > - we submit around 500 of such jobs each day > - these jobs are usually one shot, and the amount of processing can vary a > lot between jobs, and as such finding an efficient number of executors for > each job is difficult to get right, which is the reason I took the path of > dynamic allocation. > - Some of the tests have been scheduled on an idle queue, some on a full > queue. > - experiments have been conducted with spark.executor-cores = 5 and 10, only > results for 5 cores have been reported because efficiency was overall better > than with 10 cores > - the figures I give are averaged over a representative sample of those jobs > (about 600 jobs) ranging from tens to thousands splits in the data > partitioning and between 400 to 9000 seconds of wall clock time. > - executor idle timeout is set to 30s; > > Definition: > - let's say an executor has spark.executor.cores / spark.task.cpus taskSlots, > which represent the max number of tasks an executor will process in parallel. > - the current behaviour of the dynamic allocation is to allocate enough > containers to have one taskSlot per task, which minimizes latency, but wastes > resources when tasks are small regarding executor allocation and idling > overhead. > The results using the proposal (described below) over the job sample (600 > jobs): > - by using 2 tasks per taskSlot, we get a 5% (against -114%) reduction in > resource usage, for a 37% (against 43%) reduction in wall clock time for > Spark w.r.t MR > - by trying to minimize the average resource consumption, I ended up with 6 > tasks per core, with a 30% resource usage reduction, for a similar wall clock > time w.r.t. MR > What did I try to solve the issue with existing parameters (summing up a few > points mentioned in the comments) ? > - change dynamicAllocation.maxExecutors: this would need to be adapted for > each job (tens to thousands splits can occur), and essentially remove the > interest of using the dynamic allocation. > - use dynamicAllocation.backlogTimeout: > - setting this parameter right to avoid creating unused executors is very > dependant on wall clock time. One basically needs to solve the exponential > ramp up for the target time. So this is not an option for my use case where I > don't want a per-job tuning. > - I've still done a series of experiments, details in the comments. > Result is that after manual tuning, the best I could get was a similar > resource consumption at the expense of 20% more wall clock time, or a similar > wall clock time at the expense of 60% more resource consumption than what I > got using my proposal @ 6 tasks per slot (this value being optimized over a > much larger range of jobs as already stated) > - as mentioned in another comment, tampering with the exponential ramp up > might yield task imbalance and such old executors could become contention > points for other exes trying to remotely access blocks in the old exes (not > witnessed in the jobs I'm talking about, but we did see this behavior in > other jobs) > Proposal: > Simply add a tasksPerExecutorSlot parameter, which makes it possible to > specify how many tasks a single taskSlot should ideally execute to mitigate > the overhead of executor allocation. > PR: https://github.com/apache/spark/pull/19881 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22683) DynamicAllocation wastes resources by allocating containers that will barely be used
[ https://issues.apache.org/jira/browse/SPARK-22683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16450131#comment-16450131 ] Thomas Graves commented on SPARK-22683: --- Note this added a new config spark.dynamicAllocation.executorAllocationRatio, default to 1.0 which is the same behavior as existing releases. > DynamicAllocation wastes resources by allocating containers that will barely > be used > > > Key: SPARK-22683 > URL: https://issues.apache.org/jira/browse/SPARK-22683 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.1.0, 2.2.0 >Reporter: Julien Cuquemelle >Priority: Major > Labels: pull-request-available > Fix For: 2.4.0 > > > While migrating a series of jobs from MR to Spark using dynamicAllocation, > I've noticed almost a doubling (+114% exactly) of resource consumption of > Spark w.r.t MR, for a wall clock time gain of 43% > About the context: > - resource usage stands for vcore-hours allocation for the whole job, as seen > by YARN > - I'm talking about a series of jobs because we provide our users with a way > to define experiments (via UI / DSL) that automatically get translated to > Spark / MR jobs and submitted on the cluster > - we submit around 500 of such jobs each day > - these jobs are usually one shot, and the amount of processing can vary a > lot between jobs, and as such finding an efficient number of executors for > each job is difficult to get right, which is the reason I took the path of > dynamic allocation. > - Some of the tests have been scheduled on an idle queue, some on a full > queue. > - experiments have been conducted with spark.executor-cores = 5 and 10, only > results for 5 cores have been reported because efficiency was overall better > than with 10 cores > - the figures I give are averaged over a representative sample of those jobs > (about 600 jobs) ranging from tens to thousands splits in the data > partitioning and between 400 to 9000 seconds of wall clock time. > - executor idle timeout is set to 30s; > > Definition: > - let's say an executor has spark.executor.cores / spark.task.cpus taskSlots, > which represent the max number of tasks an executor will process in parallel. > - the current behaviour of the dynamic allocation is to allocate enough > containers to have one taskSlot per task, which minimizes latency, but wastes > resources when tasks are small regarding executor allocation and idling > overhead. > The results using the proposal (described below) over the job sample (600 > jobs): > - by using 2 tasks per taskSlot, we get a 5% (against -114%) reduction in > resource usage, for a 37% (against 43%) reduction in wall clock time for > Spark w.r.t MR > - by trying to minimize the average resource consumption, I ended up with 6 > tasks per core, with a 30% resource usage reduction, for a similar wall clock > time w.r.t. MR > What did I try to solve the issue with existing parameters (summing up a few > points mentioned in the comments) ? > - change dynamicAllocation.maxExecutors: this would need to be adapted for > each job (tens to thousands splits can occur), and essentially remove the > interest of using the dynamic allocation. > - use dynamicAllocation.backlogTimeout: > - setting this parameter right to avoid creating unused executors is very > dependant on wall clock time. One basically needs to solve the exponential > ramp up for the target time. So this is not an option for my use case where I > don't want a per-job tuning. > - I've still done a series of experiments, details in the comments. > Result is that after manual tuning, the best I could get was a similar > resource consumption at the expense of 20% more wall clock time, or a similar > wall clock time at the expense of 60% more resource consumption than what I > got using my proposal @ 6 tasks per slot (this value being optimized over a > much larger range of jobs as already stated) > - as mentioned in another comment, tampering with the exponential ramp up > might yield task imbalance and such old executors could become contention > points for other exes trying to remotely access blocks in the old exes (not > witnessed in the jobs I'm talking about, but we did see this behavior in > other jobs) > Proposal: > Simply add a tasksPerExecutorSlot parameter, which makes it possible to > specify how many tasks a single taskSlot should ideally execute to mitigate > the overhead of executor allocation. > PR: https://github.com/apache/spark/pull/19881 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubs
[jira] [Resolved] (SPARK-22683) DynamicAllocation wastes resources by allocating containers that will barely be used
[ https://issues.apache.org/jira/browse/SPARK-22683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-22683. --- Resolution: Fixed Fix Version/s: 2.4.0 > DynamicAllocation wastes resources by allocating containers that will barely > be used > > > Key: SPARK-22683 > URL: https://issues.apache.org/jira/browse/SPARK-22683 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.1.0, 2.2.0 >Reporter: Julien Cuquemelle >Priority: Major > Labels: pull-request-available > Fix For: 2.4.0 > > > While migrating a series of jobs from MR to Spark using dynamicAllocation, > I've noticed almost a doubling (+114% exactly) of resource consumption of > Spark w.r.t MR, for a wall clock time gain of 43% > About the context: > - resource usage stands for vcore-hours allocation for the whole job, as seen > by YARN > - I'm talking about a series of jobs because we provide our users with a way > to define experiments (via UI / DSL) that automatically get translated to > Spark / MR jobs and submitted on the cluster > - we submit around 500 of such jobs each day > - these jobs are usually one shot, and the amount of processing can vary a > lot between jobs, and as such finding an efficient number of executors for > each job is difficult to get right, which is the reason I took the path of > dynamic allocation. > - Some of the tests have been scheduled on an idle queue, some on a full > queue. > - experiments have been conducted with spark.executor-cores = 5 and 10, only > results for 5 cores have been reported because efficiency was overall better > than with 10 cores > - the figures I give are averaged over a representative sample of those jobs > (about 600 jobs) ranging from tens to thousands splits in the data > partitioning and between 400 to 9000 seconds of wall clock time. > - executor idle timeout is set to 30s; > > Definition: > - let's say an executor has spark.executor.cores / spark.task.cpus taskSlots, > which represent the max number of tasks an executor will process in parallel. > - the current behaviour of the dynamic allocation is to allocate enough > containers to have one taskSlot per task, which minimizes latency, but wastes > resources when tasks are small regarding executor allocation and idling > overhead. > The results using the proposal (described below) over the job sample (600 > jobs): > - by using 2 tasks per taskSlot, we get a 5% (against -114%) reduction in > resource usage, for a 37% (against 43%) reduction in wall clock time for > Spark w.r.t MR > - by trying to minimize the average resource consumption, I ended up with 6 > tasks per core, with a 30% resource usage reduction, for a similar wall clock > time w.r.t. MR > What did I try to solve the issue with existing parameters (summing up a few > points mentioned in the comments) ? > - change dynamicAllocation.maxExecutors: this would need to be adapted for > each job (tens to thousands splits can occur), and essentially remove the > interest of using the dynamic allocation. > - use dynamicAllocation.backlogTimeout: > - setting this parameter right to avoid creating unused executors is very > dependant on wall clock time. One basically needs to solve the exponential > ramp up for the target time. So this is not an option for my use case where I > don't want a per-job tuning. > - I've still done a series of experiments, details in the comments. > Result is that after manual tuning, the best I could get was a similar > resource consumption at the expense of 20% more wall clock time, or a similar > wall clock time at the expense of 60% more resource consumption than what I > got using my proposal @ 6 tasks per slot (this value being optimized over a > much larger range of jobs as already stated) > - as mentioned in another comment, tampering with the exponential ramp up > might yield task imbalance and such old executors could become contention > points for other exes trying to remotely access blocks in the old exes (not > witnessed in the jobs I'm talking about, but we did see this behavior in > other jobs) > Proposal: > Simply add a tasksPerExecutorSlot parameter, which makes it possible to > specify how many tasks a single taskSlot should ideally execute to mitigate > the overhead of executor allocation. > PR: https://github.com/apache/spark/pull/19881 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23850) We should not redact username|user|url from UI by default
[ https://issues.apache.org/jira/browse/SPARK-23850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16449054#comment-16449054 ] Thomas Graves commented on SPARK-23850: --- the url seems somewhat silly to me to, look at the environment page on yarn, at least in our environment it has redacted in a bunch of places that don't make sense. If its an issue with the thriftserver and certain urls we should fix those separately. > We should not redact username|user|url from UI by default > - > > Key: SPARK-23850 > URL: https://issues.apache.org/jira/browse/SPARK-23850 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.2.1 >Reporter: Thomas Graves >Priority: Major > > SPARK-22479 was filed to not print the log jdbc credentials, but in there > they also added the username and url to be redacted. I'm not sure why these > were added and to me by default these do not have security concerns. It > makes it more usable by default to be able to see these things. Users with > high security concerns can simply add them in their configs. > Also on yarn just redacting url doesn't secure anything because if you go to > the environment ui page you see all sorts of paths and really its just > confusing that some of its redacted and other parts aren't. If this was > specifically for jdbc I think it needs to be just applied there and not > broadly. > If we remove these we need to test what the jdbc driver is going to log from > SPARK-22479. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23964) why does Spillable wait for 32 elements?
[ https://issues.apache.org/jira/browse/SPARK-23964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16444043#comment-16444043 ] Thomas Graves commented on SPARK-23964: --- so far in my testing I haven't seen any performance regressions. Doing the accounting to acquire more memory takes no time at all. Obviously if you have a small heap and it can't acquire more memory, it will spill but that is what you want so you don't oom. > why does Spillable wait for 32 elements? > > > Key: SPARK-23964 > URL: https://issues.apache.org/jira/browse/SPARK-23964 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.1 >Reporter: Thomas Graves >Priority: Major > > The spillable class has a check in maybeSpill as to when it tries to acquire > more memory and determine if it should spill: > if (elementsRead % 32 == 0 && currentMemory >= myMemoryThreshold) { > Before it looks to see if it should spill. > [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/collection/Spillable.scala#L83] > I'm wondering why it has the elementsRead %32 in it? If I have a small > number of elements that are huge this can easily cause OOM before we actually > spill. > I saw a few conversations on this and one Jira related: > https://issues.apache.org/jira/browse/SPARK-4456 . but I've never seen an > answer to this. > anyone have history on this? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15703) Make ListenerBus event queue size configurable
[ https://issues.apache.org/jira/browse/SPARK-15703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16440855#comment-16440855 ] Thomas Graves commented on SPARK-15703: --- this Jira is purely making the size of the event queue configurable which would allow you to increase it as long as you have sufficient driver memory. There is no current fix for it dropping events. There is a fix that went into 2.3 that makes it so the critical services aren't affected: https://issues.apache.org/jira/browse/SPARK-18838 > Make ListenerBus event queue size configurable > -- > > Key: SPARK-15703 > URL: https://issues.apache.org/jira/browse/SPARK-15703 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Web UI >Affects Versions: 2.0.0 >Reporter: Thomas Graves >Assignee: Dhruve Ashar >Priority: Minor > Fix For: 2.0.1, 2.1.0 > > Attachments: Screen Shot 2016-06-01 at 11.21.32 AM.png, Screen Shot > 2016-06-01 at 11.23.48 AM.png, SparkListenerBus .png, > spark-dynamic-executor-allocation.png > > > The Spark UI doesn't seem to be showing all the tasks and metrics. > I ran a job with 10 tasks but Detail stage page says it completed 93029: > Summary Metrics for 93029 Completed Tasks > The Stages for all jobs pages list that only 89519/10 tasks finished but > its completed. The metrics for shuffled write and input are also incorrect. > I will attach screen shots. > I checked the logs and it does show that all the tasks actually finished. > 16/06/01 16:15:42 INFO TaskSetManager: Finished task 59880.0 in stage 2.0 > (TID 54038) in 265309 ms on 10.213.45.51 (10/10) > 16/06/01 16:15:42 INFO YarnClusterScheduler: Removed TaskSet 2.0, whose tasks > have all completed, from pool -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (YARN-8149) Revisit behavior of Re-Reservation in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16436366#comment-16436366 ] Thomas Graves commented on YARN-8149: - thinking about this a little more, even with the current preemption on, I don't think preemption is smart enough to keep starvation from happening. If preemption was smart enough to kill enough containers on a reserved node to make it so the big container actually gets scheduled there that might be ok. But last time I checked it doesn't do that. Without that or having another way to prevent starvation I wouldn't want to remove this. I think adding a config would be alright but if anyone finds it useful you can't remove and would just be an extra config. If we have other ideas to simply or make this better, great we should look at. Or if there is a way for us to get stats on if this is useful we could add those and run and determine if we should remove. > Revisit behavior of Re-Reservation in Capacity Scheduler > > > Key: YARN-8149 > URL: https://issues.apache.org/jira/browse/YARN-8149 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Priority: Critical > > Frankly speaking, I'm not sure why we need the re-reservation. The formula is > not that easy to understand: > Inside: > {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#shouldAllocOrReserveNewContainer}} > {code:java} > starvation = re-reservation / (#reserved-container * > (1 - min(requested-resource / max-alloc, > max-alloc - min-alloc / max-alloc)) > should_allocate = starvation + requiredContainers - reservedContainers > > 0{code} > I think we should be able to remove the starvation computation, just to check > requiredContainers > reservedContainers should be enough. > In a large cluster, we can easily overflow re-reservation to MAX_INT, see > YARN-7636. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8149) Revisit behavior of Re-Reservation in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16436295#comment-16436295 ] Thomas Graves commented on YARN-8149: - are you going to do anything with starvation then or allocation a certain % more then what is required? I am hesitant to remove this without doing some major testing. I haven't had a chance to look at the latest code to investigate. It might be more fine now that we do continue looking at other nodes after reservation where as originally that didn't happen. Is in queue preemption on by default? > Revisit behavior of Re-Reservation in Capacity Scheduler > > > Key: YARN-8149 > URL: https://issues.apache.org/jira/browse/YARN-8149 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Priority: Critical > > Frankly speaking, I'm not sure why we need the re-reservation. The formula is > not that easy to understand: > Inside: > {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#shouldAllocOrReserveNewContainer}} > {code:java} > starvation = re-reservation / (#reserved-container * > (1 - min(requested-resource / max-alloc, > max-alloc - min-alloc / max-alloc)) > should_allocate = starvation + requiredContainers - reservedContainers > > 0{code} > I think we should be able to remove the starvation computation, just to check > requiredContainers > reservedContainers should be enough. > In a large cluster, we can easily overflow re-reservation to MAX_INT, see > YARN-7636. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (SPARK-23964) why does Spillable wait for 32 elements?
[ https://issues.apache.org/jira/browse/SPARK-23964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16434479#comment-16434479 ] Thomas Graves commented on SPARK-23964: --- I'm not sure, I'm trying to figure out if there is a performance implications here and perhaps there are but its at the cost of not being accurate on memory usage. In the deployments with fixed sized containers this is very important. if you wait 32 elements it may cause you to acquire a bigger chunk of memory at once vs getting smaller allocations (thus more). I would think the only check you need is: currentMemory >= myMemoryThreshold, the initial threshold is 5MB right now but all its doing is asking for more memory, only when it can't get memory does it spill. And the initial threshold is configurable so you can always make it bigger. I'm going to try to do some performance tests to see what happens but would like to know if anyone has other background. > why does Spillable wait for 32 elements? > > > Key: SPARK-23964 > URL: https://issues.apache.org/jira/browse/SPARK-23964 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.1 >Reporter: Thomas Graves >Priority: Major > > The spillable class has a check in maybeSpill as to when it tries to acquire > more memory and determine if it should spill: > if (elementsRead % 32 == 0 && currentMemory >= myMemoryThreshold) { > Before it looks to see if it should spill. > [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/collection/Spillable.scala#L83] > I'm wondering why it has the elementsRead %32 in it? If I have a small > number of elements that are huge this can easily cause OOM before we actually > spill. > I saw a few conversations on this and one Jira related: > https://issues.apache.org/jira/browse/SPARK-4456 . but I've never seen an > answer to this. > anyone have history on this? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23964) why does Spillable wait for 32 elements?
[ https://issues.apache.org/jira/browse/SPARK-23964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-23964: -- Description: The spillable class has a check in maybeSpill as to when it tries to acquire more memory and determine if it should spill: if (elementsRead % 32 == 0 && currentMemory >= myMemoryThreshold) { Before it looks to see if it should spill. [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/collection/Spillable.scala#L83] I'm wondering why it has the elementsRead %32 in it? If I have a small number of elements that are huge this can easily cause OOM before we actually spill. I saw a few conversations on this and one Jira related: https://issues.apache.org/jira/browse/SPARK-4456 . but I've never seen an answer to this. anyone have history on this? was: The spillable class has a check: if (elementsRead % 32 == 0 && currentMemory >= myMemoryThreshold) { Before it looks to see if it should spill. [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/collection/Spillable.scala#L83] I'm wondering why it has the elementsRead %32 in it? If I have a small number of elements that are huge this can easily cause OOM before we actually spill. I saw a few conversations on this and one Jira related: https://issues.apache.org/jira/browse/SPARK-4456 . but I've never seen an answer to this. anyone have history on this? > why does Spillable wait for 32 elements? > > > Key: SPARK-23964 > URL: https://issues.apache.org/jira/browse/SPARK-23964 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.1 >Reporter: Thomas Graves >Priority: Major > > The spillable class has a check in maybeSpill as to when it tries to acquire > more memory and determine if it should spill: > if (elementsRead % 32 == 0 && currentMemory >= myMemoryThreshold) { > Before it looks to see if it should spill. > [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/collection/Spillable.scala#L83] > I'm wondering why it has the elementsRead %32 in it? If I have a small > number of elements that are huge this can easily cause OOM before we actually > spill. > I saw a few conversations on this and one Jira related: > https://issues.apache.org/jira/browse/SPARK-4456 . but I've never seen an > answer to this. > anyone have history on this? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23964) why does Spillable wait for 32 elements?
[ https://issues.apache.org/jira/browse/SPARK-23964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16434454#comment-16434454 ] Thomas Graves commented on SPARK-23964: --- [~andrewor14] [~matei] [~r...@databricks.com] A few related threads: [https://github.com/apache/spark/pull/3302] [https://github.com/apache/spark/pull/3656] https://github.com/apache/spark/commit/3be92cdac30cf488e09dbdaaa70e5c4cdaa9a099 > why does Spillable wait for 32 elements? > > > Key: SPARK-23964 > URL: https://issues.apache.org/jira/browse/SPARK-23964 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.1 >Reporter: Thomas Graves >Priority: Major > > The spillable class has a check: > if (elementsRead % 32 == 0 && currentMemory >= myMemoryThreshold) { > Before it looks to see if it should spill. > [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/collection/Spillable.scala#L83] > I'm wondering why it has the elementsRead %32 in it? If I have a small > number of elements that are huge this can easily cause OOM before we actually > spill. > I saw a few conversations on this and one Jira related: > https://issues.apache.org/jira/browse/SPARK-4456 . but I've never seen an > answer to this. > anyone have history on this? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23964) why does Spillable wait for 32 elements?
[ https://issues.apache.org/jira/browse/SPARK-23964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-23964: -- Description: The spillable class has a check: if (elementsRead % 32 == 0 && currentMemory >= myMemoryThreshold) { Before it looks to see if it should spill. [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/collection/Spillable.scala#L83] I'm wondering why it has the elementsRead %32 in it? If I have a small number of elements that are huge this can easily cause OOM before we actually spill. I saw a few conversations on this and one Jira related: https://issues.apache.org/jira/browse/SPARK-4456 . but I've never seen an answer to this. anyone have history on this? > why does Spillable wait for 32 elements? > > > Key: SPARK-23964 > URL: https://issues.apache.org/jira/browse/SPARK-23964 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.1 >Reporter: Thomas Graves >Priority: Major > > The spillable class has a check: > if (elementsRead % 32 == 0 && currentMemory >= myMemoryThreshold) { > Before it looks to see if it should spill. > [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/collection/Spillable.scala#L83] > I'm wondering why it has the elementsRead %32 in it? If I have a small > number of elements that are huge this can easily cause OOM before we actually > spill. > I saw a few conversations on this and one Jira related: > https://issues.apache.org/jira/browse/SPARK-4456 . but I've never seen an > answer to this. > anyone have history on this? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23964) why does Spillable wait for 32 elements?
[ https://issues.apache.org/jira/browse/SPARK-23964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-23964: -- Environment: (was: The spillable class has a check: if (elementsRead % 32 == 0 && currentMemory >= myMemoryThreshold) { Before it looks to see if it should spill. [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/collection/Spillable.scala#L83] I'm wondering why it has the elementsRead %32 in it? If I have a small number of elements that are huge this can easily cause OOM before we actually spill. I saw a few conversations on this and one Jira related: https://issues.apache.org/jira/browse/SPARK-4456 . but I've never seen an answer to this. anyone have history on this?) > why does Spillable wait for 32 elements? > > > Key: SPARK-23964 > URL: https://issues.apache.org/jira/browse/SPARK-23964 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.1 >Reporter: Thomas Graves >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23964) why does Spillable wait for 32 elements?
Thomas Graves created SPARK-23964: - Summary: why does Spillable wait for 32 elements? Key: SPARK-23964 URL: https://issues.apache.org/jira/browse/SPARK-23964 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.2.1 Environment: The spillable class has a check: if (elementsRead % 32 == 0 && currentMemory >= myMemoryThreshold) { Before it looks to see if it should spill. [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/collection/Spillable.scala#L83] I'm wondering why it has the elementsRead %32 in it? If I have a small number of elements that are huge this can easily cause OOM before we actually spill. I saw a few conversations on this and one Jira related: https://issues.apache.org/jira/browse/SPARK-4456 . but I've never seen an answer to this. anyone have history on this? Reporter: Thomas Graves -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16630) Blacklist a node if executors won't launch on it.
[ https://issues.apache.org/jira/browse/SPARK-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432467#comment-16432467 ] Thomas Graves commented on SPARK-16630: --- sorry I don't follow, the list we get from the blacklist tracker is all nodes that are blacklisted currently that haven't met the expiry to unblacklist them. You just union them with the yarn allocator list. There is obviously some race condition there if one of the nodes it just about to be unblacklisted but I don't see that as a major issue, the next allocation will not have it. Is there something I'm missing? > Blacklist a node if executors won't launch on it. > - > > Key: SPARK-16630 > URL: https://issues.apache.org/jira/browse/SPARK-16630 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 1.6.2 >Reporter: Thomas Graves >Priority: Major > > On YARN, its possible that a node is messed or misconfigured such that a > container won't launch on it. For instance if the Spark external shuffle > handler didn't get loaded on it , maybe its just some other hardware issue or > hadoop configuration issue. > It would be nice we could recognize this happening and stop trying to launch > executors on it since that could end up causing us to hit our max number of > executor failures and then kill the job. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16630) Blacklist a node if executors won't launch on it.
[ https://issues.apache.org/jira/browse/SPARK-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432244#comment-16432244 ] Thomas Graves commented on SPARK-16630: --- yes I think it would make sense as the union of all blacklisted nodes. I'm not sure what you mean by your last question. The expiry currently is all handled in the BlacklistTracker, I wouldn't want to move that out into the yarn allocator. Just use the information passed to it unless there is a case it doesn't cover? > Blacklist a node if executors won't launch on it. > - > > Key: SPARK-16630 > URL: https://issues.apache.org/jira/browse/SPARK-16630 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 1.6.2 >Reporter: Thomas Graves >Priority: Major > > On YARN, its possible that a node is messed or misconfigured such that a > container won't launch on it. For instance if the Spark external shuffle > handler didn't get loaded on it , maybe its just some other hardware issue or > hadoop configuration issue. > It would be nice we could recognize this happening and stop trying to launch > executors on it since that could end up causing us to hit our max number of > executor failures and then kill the job. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16630) Blacklist a node if executors won't launch on it.
[ https://issues.apache.org/jira/browse/SPARK-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16427377#comment-16427377 ] Thomas Graves commented on SPARK-16630: --- the problem is that spark.executor.instances (or dynamic allocation) doesn't necessarily represent the # of nodes in the cluster, especially if you look at dynamic allocation. Depending on the size of your nodes you can have a lot more executors then nodes, thus it could easily end up blacklisting the entire cluster. I would rather look at the actual # of nodes in the cluster. Is that turning out to be hard? > Blacklist a node if executors won't launch on it. > - > > Key: SPARK-16630 > URL: https://issues.apache.org/jira/browse/SPARK-16630 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 1.6.2 >Reporter: Thomas Graves >Priority: Major > > On YARN, its possible that a node is messed or misconfigured such that a > container won't launch on it. For instance if the Spark external shuffle > handler didn't get loaded on it , maybe its just some other hardware issue or > hadoop configuration issue. > It would be nice we could recognize this happening and stop trying to launch > executors on it since that could end up causing us to hit our max number of > executor failures and then kill the job. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-23567) spark.redaction.regex should not include user by default, docs not updated
[ https://issues.apache.org/jira/browse/SPARK-23567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-23567. --- Resolution: Duplicate > spark.redaction.regex should not include user by default, docs not updated > -- > > Key: SPARK-23567 > URL: https://issues.apache.org/jira/browse/SPARK-23567 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.1 >Reporter: Thomas Graves >Priority: Major > > SPARK-22479 changed to redact the user name by default. I would argue > username isn't something that should be redacted by default and its very > useful for debugging and other things. If people are running super secure and > want to turn it on they can but I don't see the user name as a default > security setting. There are also other ways on the UI to see the user name, > for instance on yarn you can go to the Environment page and looking at the > resources and see the username in the paths. > Also the Jira did not update the default setting in the docs, so the docs are > out of date: > http://spark.apache.org/docs/2.2.1/configuration.html -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23039) Fix the bug in alter table set location.
[ https://issues.apache.org/jira/browse/SPARK-23039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16423996#comment-16423996 ] Thomas Graves commented on SPARK-23039: --- seems to be a dup of # SPARK-23057 > Fix the bug in alter table set location. > - > > Key: SPARK-23039 > URL: https://issues.apache.org/jira/browse/SPARK-23039 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.1 >Reporter: xubo245 >Priority: Critical > > TOBO work: Fix the bug in alter table set location. > org.apache.spark.sql.execution.command.DDLSuite#testSetLocation > {code:java} > // TODO(gatorsmile): fix the bug in alter table set location. >//if (isUsingHiveMetastore) { > //assert(storageFormat.properties.get("path") === expected) > // } > {code} > Analysis: > because user add locationUri and erase path by > {code:java} > newPath = None > {code} > in org.apache.spark.sql.hive.HiveExternalCatalog#restoreDataSourceTable: > {code:java} > val storageWithLocation = { > val tableLocation = getLocationFromStorageProps(table) > // We pass None as `newPath` here, to remove the path option in storage > properties. > updateLocationInStorageProps(table, newPath = None).copy( > locationUri = tableLocation.map(CatalogUtils.stringToURI(_))) > } > {code} > => > newPath = None -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23850) We should not redact username|user|url from UI by default
[ https://issues.apache.org/jira/browse/SPARK-23850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16422965#comment-16422965 ] Thomas Graves commented on SPARK-23850: --- ping [~ash...@gmail.com] [~onursatici] [~LI,Xiao] [~jiangxb1987] [~cloud_fan] who were in code review, is there more background on why these were added? > We should not redact username|user|url from UI by default > - > > Key: SPARK-23850 > URL: https://issues.apache.org/jira/browse/SPARK-23850 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.2.1 >Reporter: Thomas Graves >Priority: Major > > SPARK-22479 was filed to not print the log jdbc credentials, but in there > they also added the username and url to be redacted. I'm not sure why these > were added and to me by default these do not have security concerns. It > makes it more usable by default to be able to see these things. Users with > high security concerns can simply add them in their configs. > Also on yarn just redacting url doesn't secure anything because if you go to > the environment ui page you see all sorts of paths and really its just > confusing that some of its redacted and other parts aren't. If this was > specifically for jdbc I think it needs to be just applied there and not > broadly. > If we remove these we need to test what the jdbc driver is going to log from > SPARK-22479. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23850) We should not redact username|user|url from UI by default
Thomas Graves created SPARK-23850: - Summary: We should not redact username|user|url from UI by default Key: SPARK-23850 URL: https://issues.apache.org/jira/browse/SPARK-23850 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 2.2.1 Reporter: Thomas Graves SPARK-22479 was filed to not print the log jdbc credentials, but in there they also added the username and url to be redacted. I'm not sure why these were added and to me by default these do not have security concerns. It makes it more usable by default to be able to see these things. Users with high security concerns can simply add them in their configs. Also on yarn just redacting url doesn't secure anything because if you go to the environment ui page you see all sorts of paths and really its just confusing that some of its redacted and other parts aren't. If this was specifically for jdbc I think it needs to be just applied there and not broadly. If we remove these we need to test what the jdbc driver is going to log from SPARK-22479. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23806) Broadcast. unpersist can cause fatal exception when used with dynamic allocation
[ https://issues.apache.org/jira/browse/SPARK-23806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-23806: -- Description: Very similar to https://issues.apache.org/jira/browse/SPARK-22618 . But this could also apply to Broadcast.unpersist. 2018-03-24 05:29:17,836 [Spark Context Cleaner] ERROR org.apache.spark.ContextCleaner - Error cleaning broadcast 85710 org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) at org.apache.spark.storage.BlockManagerMaster.removeBroadcast(BlockManagerMaster.scala:152) at org.apache.spark.broadcast.TorrentBroadcast$.unpersist(TorrentBroadcast.scala:306) at org.apache.spark.broadcast.TorrentBroadcastFactory.unbroadcast(TorrentBroadcastFactory.scala:45) at org.apache.spark.broadcast.BroadcastManager.unbroadcast(BroadcastManager.scala:60) at org.apache.spark.ContextCleaner.doCleanupBroadcast(ContextCleaner.scala:238) at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$1.apply(ContextCleaner.scala:194) at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$1.apply(ContextCleaner.scala:185) at scala.Option.foreach(Option.scala:257) at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:185) at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1286) at org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:178) at org.apache.spark.ContextCleaner$$anon$1.run(ContextCleaner.scala:73) Caused by: java.io.IOException: Failed to send RPC 7228115282075984867 to /10.10.10.10:53804: java.nio.channels.ClosedChannelException at org.apache.spark.network.client.TransportClient.lambda$sendRpc$2(TransportClient.java:237) at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507) at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481) at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420) at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:122) at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:852) at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:738) at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1251) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:733) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:725) at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:35) at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:1062) at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:1116) at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:1051) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:399) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:446) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144) at java.lang.Thread.run(Thread.java:745) Caused by: java.nio.channels.ClosedChannelException was: Very similar to https://issues.apache.org/jira/browse/SPARK-2261 . But this could also apply to Broadcast.unpersist. 2018-03-24 05:29:17,836 [Spark Context Cleaner] ERROR org.apache.spark.ContextCleaner - Error cleaning broadcast 85710 org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) at org.apache.spark.storage.BlockManagerMaster.removeBroadcast(BlockManagerMaster.scala:152) at org.apache.spark.broadcast.TorrentBroadcast$.unpersist(TorrentBroadcast.scala:306) at org.apache.spark.broadcast.TorrentBroadcastFactory.unbroadcast(TorrentBroadcastFactory.scala:45) at org.apache.spark.broadcast.BroadcastManager.unbroadcast(BroadcastManager.scala:60) at org.apache.spark.ContextCleaner.doCleanupBroadcast(ContextCleaner.scala:238) at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$1.apply(ContextCleaner.scala:194) at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun
[jira] [Created] (SPARK-23806) Broadcast. unpersist can cause fatal exception when used with dynamic allocation
Thomas Graves created SPARK-23806: - Summary: Broadcast. unpersist can cause fatal exception when used with dynamic allocation Key: SPARK-23806 URL: https://issues.apache.org/jira/browse/SPARK-23806 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.2.0 Reporter: Thomas Graves Very similar to https://issues.apache.org/jira/browse/SPARK-2261 . But this could also apply to Broadcast.unpersist. 2018-03-24 05:29:17,836 [Spark Context Cleaner] ERROR org.apache.spark.ContextCleaner - Error cleaning broadcast 85710 org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) at org.apache.spark.storage.BlockManagerMaster.removeBroadcast(BlockManagerMaster.scala:152) at org.apache.spark.broadcast.TorrentBroadcast$.unpersist(TorrentBroadcast.scala:306) at org.apache.spark.broadcast.TorrentBroadcastFactory.unbroadcast(TorrentBroadcastFactory.scala:45) at org.apache.spark.broadcast.BroadcastManager.unbroadcast(BroadcastManager.scala:60) at org.apache.spark.ContextCleaner.doCleanupBroadcast(ContextCleaner.scala:238) at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$1.apply(ContextCleaner.scala:194) at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$1.apply(ContextCleaner.scala:185) at scala.Option.foreach(Option.scala:257) at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:185) at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1286) at org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:178) at org.apache.spark.ContextCleaner$$anon$1.run(ContextCleaner.scala:73) Caused by: java.io.IOException: Failed to send RPC 7228115282075984867 to /10.10.10.10:53804: java.nio.channels.ClosedChannelException at org.apache.spark.network.client.TransportClient.lambda$sendRpc$2(TransportClient.java:237) at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507) at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481) at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420) at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:122) at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:852) at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:738) at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1251) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:733) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:725) at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:35) at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:1062) at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:1116) at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:1051) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:399) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:446) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144) at java.lang.Thread.run(Thread.java:745) Caused by: java.nio.channels.ClosedChannelException -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22618) RDD.unpersist can cause fatal exception when used with dynamic allocation
[ https://issues.apache.org/jira/browse/SPARK-22618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16417466#comment-16417466 ] Thomas Graves commented on SPARK-22618: --- I'll file a separate Jira for it and put up a pr > RDD.unpersist can cause fatal exception when used with dynamic allocation > - > > Key: SPARK-22618 > URL: https://issues.apache.org/jira/browse/SPARK-22618 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Brad >Assignee: Brad >Priority: Minor > Fix For: 2.3.0 > > > If you use rdd.unpersist() with dynamic allocation, then an executor can be > deallocated while your rdd is being removed, which will throw an uncaught > exception killing your job. > I looked into different ways of preventing this error from occurring but > couldn't come up with anything that wouldn't require a big change. I propose > the best fix is just to catch and log IOExceptions in unpersist() so they > don't kill your job. This will match the effective behavior when executors > are lost from dynamic allocation in other parts of the code. > In the worst case scenario I think this could lead to RDD partitions getting > left on executors after they were unpersisted, but this is probably better > than the whole job failing. I think in most cases the IOException would be > due to the executor dieing for some reason, which is effectively the same > result as unpersisting the rdd from that executor anyway. > I noticed this exception in a job that loads a 100GB dataset on a cluster > where we use dynamic allocation heavily. Here is the relevant stack trace > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:192) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) > at > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:221) > at > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:899) > at > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:276) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131) > at > io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) > at java.lang.Thread.run(Thread.java:748) > Exception in thread "main" org.apache.spark.SparkException: Exception thrown > in awaitResult: > at > org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) > at > org.apache.spark.storage.BlockManagerMaster.removeRdd(BlockManagerMaster.scala:131) > at org.apache.spark.SparkContext.unpersistRDD(SparkContext.scala:1806) > at org.apache.spark.rdd.RDD.unpersist(RDD.scala:217) > at > com.ibm.sparktc.sparkbench.workload.exercise.CacheTest.doWorkload(CacheTest.scala:62) > at > com.ibm.sparktc.sparkbench.workload.Workload$class.run(Workload.scala:40) > at > com.ibm.sparktc.sparkbench.workload.exercise.CacheTest.run(CacheTest.scala:33) > at > com.ibm.sparktc.sparkbench.workload.SuiteKickoff$$anonfun$com$ibm$sparktc$sparkbench$workload$SuiteKickoff$$runSerially$1.apply(SuiteKickoff.scala:78) > at > com.ibm.sparktc.sparkbench.workload.SuiteKickoff$$anonfun$com$ibm$sparktc$sparkbench$workload$SuiteKickoff$$runSerially$1.apply(SuiteKickoff.scala:78) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.immutable.List.foreach(List.scala:381) > at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.immutable.List.map(List.scala:285) > at > com.ibm.sparktc.sparkbench.workload.SuiteKickoff$.com$ibm$sparktc$sparkbench$workload$SuiteKickoff$$runSerially(SuiteKickoff.
[jira] [Commented] (SPARK-22618) RDD.unpersist can cause fatal exception when used with dynamic allocation
[ https://issues.apache.org/jira/browse/SPARK-22618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16416309#comment-16416309 ] Thomas Graves commented on SPARK-22618: --- thanks for fixing this, hitting it now in spark 2.2, I think this same issue can happen with broadcast variables if its told to wait, did you happen to look at that at the same time? > RDD.unpersist can cause fatal exception when used with dynamic allocation > - > > Key: SPARK-22618 > URL: https://issues.apache.org/jira/browse/SPARK-22618 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Brad >Assignee: Brad >Priority: Minor > Fix For: 2.3.0 > > > If you use rdd.unpersist() with dynamic allocation, then an executor can be > deallocated while your rdd is being removed, which will throw an uncaught > exception killing your job. > I looked into different ways of preventing this error from occurring but > couldn't come up with anything that wouldn't require a big change. I propose > the best fix is just to catch and log IOExceptions in unpersist() so they > don't kill your job. This will match the effective behavior when executors > are lost from dynamic allocation in other parts of the code. > In the worst case scenario I think this could lead to RDD partitions getting > left on executors after they were unpersisted, but this is probably better > than the whole job failing. I think in most cases the IOException would be > due to the executor dieing for some reason, which is effectively the same > result as unpersisting the rdd from that executor anyway. > I noticed this exception in a job that loads a 100GB dataset on a cluster > where we use dynamic allocation heavily. Here is the relevant stack trace > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:192) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) > at > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:221) > at > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:899) > at > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:276) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131) > at > io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) > at java.lang.Thread.run(Thread.java:748) > Exception in thread "main" org.apache.spark.SparkException: Exception thrown > in awaitResult: > at > org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) > at > org.apache.spark.storage.BlockManagerMaster.removeRdd(BlockManagerMaster.scala:131) > at org.apache.spark.SparkContext.unpersistRDD(SparkContext.scala:1806) > at org.apache.spark.rdd.RDD.unpersist(RDD.scala:217) > at > com.ibm.sparktc.sparkbench.workload.exercise.CacheTest.doWorkload(CacheTest.scala:62) > at > com.ibm.sparktc.sparkbench.workload.Workload$class.run(Workload.scala:40) > at > com.ibm.sparktc.sparkbench.workload.exercise.CacheTest.run(CacheTest.scala:33) > at > com.ibm.sparktc.sparkbench.workload.SuiteKickoff$$anonfun$com$ibm$sparktc$sparkbench$workload$SuiteKickoff$$runSerially$1.apply(SuiteKickoff.scala:78) > at > com.ibm.sparktc.sparkbench.workload.SuiteKickoff$$anonfun$com$ibm$sparktc$sparkbench$workload$SuiteKickoff$$runSerially$1.apply(SuiteKickoff.scala:78) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.immutable.List.foreach(List.scala:381) > at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.immutable.List.map(List.scala:285) >
[jira] [Commented] (SPARK-16630) Blacklist a node if executors won't launch on it.
[ https://issues.apache.org/jira/browse/SPARK-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16390232#comment-16390232 ] Thomas Graves commented on SPARK-16630: --- yes yarn tells you the # of nodemanagers. allocateResponse -> getNumClusterNodes > Blacklist a node if executors won't launch on it. > - > > Key: SPARK-16630 > URL: https://issues.apache.org/jira/browse/SPARK-16630 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 1.6.2 >Reporter: Thomas Graves >Priority: Major > > On YARN, its possible that a node is messed or misconfigured such that a > container won't launch on it. For instance if the Spark external shuffle > handler didn't get loaded on it , maybe its just some other hardware issue or > hadoop configuration issue. > It would be nice we could recognize this happening and stop trying to launch > executors on it since that could end up causing us to hit our max number of > executor failures and then kill the job. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22683) DynamicAllocation wastes resources by allocating containers that will barely be used
[ https://issues.apache.org/jira/browse/SPARK-22683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16387963#comment-16387963 ] Thomas Graves commented on SPARK-22683: --- I left comments on the open PR already, lets move the discussion there > DynamicAllocation wastes resources by allocating containers that will barely > be used > > > Key: SPARK-22683 > URL: https://issues.apache.org/jira/browse/SPARK-22683 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.1.0, 2.2.0 >Reporter: Julien Cuquemelle >Priority: Major > Labels: pull-request-available > > While migrating a series of jobs from MR to Spark using dynamicAllocation, > I've noticed almost a doubling (+114% exactly) of resource consumption of > Spark w.r.t MR, for a wall clock time gain of 43% > About the context: > - resource usage stands for vcore-hours allocation for the whole job, as seen > by YARN > - I'm talking about a series of jobs because we provide our users with a way > to define experiments (via UI / DSL) that automatically get translated to > Spark / MR jobs and submitted on the cluster > - we submit around 500 of such jobs each day > - these jobs are usually one shot, and the amount of processing can vary a > lot between jobs, and as such finding an efficient number of executors for > each job is difficult to get right, which is the reason I took the path of > dynamic allocation. > - Some of the tests have been scheduled on an idle queue, some on a full > queue. > - experiments have been conducted with spark.executor-cores = 5 and 10, only > results for 5 cores have been reported because efficiency was overall better > than with 10 cores > - the figures I give are averaged over a representative sample of those jobs > (about 600 jobs) ranging from tens to thousands splits in the data > partitioning and between 400 to 9000 seconds of wall clock time. > - executor idle timeout is set to 30s; > > Definition: > - let's say an executor has spark.executor.cores / spark.task.cpus taskSlots, > which represent the max number of tasks an executor will process in parallel. > - the current behaviour of the dynamic allocation is to allocate enough > containers to have one taskSlot per task, which minimizes latency, but wastes > resources when tasks are small regarding executor allocation and idling > overhead. > The results using the proposal (described below) over the job sample (600 > jobs): > - by using 2 tasks per taskSlot, we get a 5% (against -114%) reduction in > resource usage, for a 37% (against 43%) reduction in wall clock time for > Spark w.r.t MR > - by trying to minimize the average resource consumption, I ended up with 6 > tasks per core, with a 30% resource usage reduction, for a similar wall clock > time w.r.t. MR > What did I try to solve the issue with existing parameters (summing up a few > points mentioned in the comments) ? > - change dynamicAllocation.maxExecutors: this would need to be adapted for > each job (tens to thousands splits can occur), and essentially remove the > interest of using the dynamic allocation. > - use dynamicAllocation.backlogTimeout: > - setting this parameter right to avoid creating unused executors is very > dependant on wall clock time. One basically needs to solve the exponential > ramp up for the target time. So this is not an option for my use case where I > don't want a per-job tuning. > - I've still done a series of experiments, details in the comments. > Result is that after manual tuning, the best I could get was a similar > resource consumption at the expense of 20% more wall clock time, or a similar > wall clock time at the expense of 60% more resource consumption than what I > got using my proposal @ 6 tasks per slot (this value being optimized over a > much larger range of jobs as already stated) > - as mentioned in another comment, tampering with the exponential ramp up > might yield task imbalance and such old executors could become contention > points for other exes trying to remotely access blocks in the old exes (not > witnessed in the jobs I'm talking about, but we did see this behavior in > other jobs) > Proposal: > Simply add a tasksPerExecutorSlot parameter, which makes it possible to > specify how many tasks a single taskSlot should ideally execute to mitigate > the overhead of executor allocation. > PR: https://github.com/apache/spark/pull/19881 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22683) DynamicAllocation wastes resources by allocating containers that will barely be used
[ https://issues.apache.org/jira/browse/SPARK-22683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16387938#comment-16387938 ] Thomas Graves commented on SPARK-22683: --- [~jcuquemelle] do you have time to update the PR, otherwise we should close that for now > DynamicAllocation wastes resources by allocating containers that will barely > be used > > > Key: SPARK-22683 > URL: https://issues.apache.org/jira/browse/SPARK-22683 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.1.0, 2.2.0 >Reporter: Julien Cuquemelle >Priority: Major > Labels: pull-request-available > > While migrating a series of jobs from MR to Spark using dynamicAllocation, > I've noticed almost a doubling (+114% exactly) of resource consumption of > Spark w.r.t MR, for a wall clock time gain of 43% > About the context: > - resource usage stands for vcore-hours allocation for the whole job, as seen > by YARN > - I'm talking about a series of jobs because we provide our users with a way > to define experiments (via UI / DSL) that automatically get translated to > Spark / MR jobs and submitted on the cluster > - we submit around 500 of such jobs each day > - these jobs are usually one shot, and the amount of processing can vary a > lot between jobs, and as such finding an efficient number of executors for > each job is difficult to get right, which is the reason I took the path of > dynamic allocation. > - Some of the tests have been scheduled on an idle queue, some on a full > queue. > - experiments have been conducted with spark.executor-cores = 5 and 10, only > results for 5 cores have been reported because efficiency was overall better > than with 10 cores > - the figures I give are averaged over a representative sample of those jobs > (about 600 jobs) ranging from tens to thousands splits in the data > partitioning and between 400 to 9000 seconds of wall clock time. > - executor idle timeout is set to 30s; > > Definition: > - let's say an executor has spark.executor.cores / spark.task.cpus taskSlots, > which represent the max number of tasks an executor will process in parallel. > - the current behaviour of the dynamic allocation is to allocate enough > containers to have one taskSlot per task, which minimizes latency, but wastes > resources when tasks are small regarding executor allocation and idling > overhead. > The results using the proposal (described below) over the job sample (600 > jobs): > - by using 2 tasks per taskSlot, we get a 5% (against -114%) reduction in > resource usage, for a 37% (against 43%) reduction in wall clock time for > Spark w.r.t MR > - by trying to minimize the average resource consumption, I ended up with 6 > tasks per core, with a 30% resource usage reduction, for a similar wall clock > time w.r.t. MR > What did I try to solve the issue with existing parameters (summing up a few > points mentioned in the comments) ? > - change dynamicAllocation.maxExecutors: this would need to be adapted for > each job (tens to thousands splits can occur), and essentially remove the > interest of using the dynamic allocation. > - use dynamicAllocation.backlogTimeout: > - setting this parameter right to avoid creating unused executors is very > dependant on wall clock time. One basically needs to solve the exponential > ramp up for the target time. So this is not an option for my use case where I > don't want a per-job tuning. > - I've still done a series of experiments, details in the comments. > Result is that after manual tuning, the best I could get was a similar > resource consumption at the expense of 20% more wall clock time, or a similar > wall clock time at the expense of 60% more resource consumption than what I > got using my proposal @ 6 tasks per slot (this value being optimized over a > much larger range of jobs as already stated) > - as mentioned in another comment, tampering with the exponential ramp up > might yield task imbalance and such old executors could become contention > points for other exes trying to remotely access blocks in the old exes (not > witnessed in the jobs I'm talking about, but we did see this behavior in > other jobs) > Proposal: > Simply add a tasksPerExecutorSlot parameter, which makes it possible to > specify how many tasks a single taskSlot should ideally execute to mitigate > the overhead of executor allocation. > PR: https://github.com/apache/spark/pull/19881 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16630) Blacklist a node if executors won't launch on it.
[ https://issues.apache.org/jira/browse/SPARK-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16387846#comment-16387846 ] Thomas Graves commented on SPARK-16630: --- yes something along these lines is what I was thinking. we would want a configurable number of failures (perhaps we can reuse one of the existing settings, but woudl need to think about more) at which point we would blacklist the node due to executor launch failures and we could have a timeout at which point we could retry. We also want to take into account small clusters and perhaps stop blacklisting if a certain percent of the cluster is already blacklisted. > Blacklist a node if executors won't launch on it. > - > > Key: SPARK-16630 > URL: https://issues.apache.org/jira/browse/SPARK-16630 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 1.6.2 >Reporter: Thomas Graves >Priority: Major > > On YARN, its possible that a node is messed or misconfigured such that a > container won't launch on it. For instance if the Spark external shuffle > handler didn't get loaded on it , maybe its just some other hardware issue or > hadoop configuration issue. > It would be nice we could recognize this happening and stop trying to launch > executors on it since that could end up causing us to hit our max number of > executor failures and then kill the job. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-23567) spark.redaction.regex should not include user by default, docs not updated
[ https://issues.apache.org/jira/browse/SPARK-23567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383723#comment-16383723 ] Thomas Graves edited comment on SPARK-23567 at 3/2/18 3:57 PM: --- I also question whether the url should be redacted by default, default example given, I'm not sure how the url here is security issue. {code:java} SaveIntoDataSourceCommand jdbc, Map(dbtable -> test10, driver -> org.postgresql.Driver, url -> jdbc:postgresql:sparkdb, password -> pass), ErrorIfExists{code} was (Author: tgraves): I also question whether the url should be redacted by default, but I would want to look more at SPARK-22479 to understand what url was hidden since the Jira doesn't have an example. > spark.redaction.regex should not include user by default, docs not updated > -- > > Key: SPARK-23567 > URL: https://issues.apache.org/jira/browse/SPARK-23567 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.1 >Reporter: Thomas Graves >Priority: Major > > SPARK-22479 changed to redact the user name by default. I would argue > username isn't something that should be redacted by default and its very > useful for debugging and other things. If people are running super secure and > want to turn it on they can but I don't see the user name as a default > security setting. There are also other ways on the UI to see the user name, > for instance on yarn you can go to the Environment page and looking at the > resources and see the username in the paths. > Also the Jira did not update the default setting in the docs, so the docs are > out of date: > http://spark.apache.org/docs/2.2.1/configuration.html -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22479) SaveIntoDataSourceCommand logs jdbc credentials
[ https://issues.apache.org/jira/browse/SPARK-22479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383733#comment-16383733 ] Thomas Graves commented on SPARK-22479: --- Also the example above shows the password, but the password should have been already redacted, this pr excluded url, user, and username. Was the password not being redacted for some reason? > SaveIntoDataSourceCommand logs jdbc credentials > --- > > Key: SPARK-22479 > URL: https://issues.apache.org/jira/browse/SPARK-22479 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Onur Satici >Assignee: Onur Satici >Priority: Major > Fix For: 2.2.1, 2.3.0 > > > JDBC credentials are not redacted in plans including a > 'SaveIntoDataSourceCommand'. > Steps to reproduce: > {code} > spark-shell --packages org.postgresql:postgresql:42.1.1 > {code} > {code} > import org.apache.spark.sql.execution.QueryExecution > import org.apache.spark.sql.util.QueryExecutionListener > val listener = new QueryExecutionListener { > override def onFailure(funcName: String, qe: QueryExecution, exception: > Exception): Unit = {} > override def onSuccess(funcName: String, qe: QueryExecution, duration: > Long): Unit = { > System.out.println(qe.toString()) > } > } > spark.listenerManager.register(listener) > spark.range(100).write.format("jdbc").option("url", > "jdbc:postgresql:sparkdb").option("password", "pass").option("driver", > "org.postgresql.Driver").option("dbtable", "test").save() > {code} > The above will yield the following plan: > {code} > == Parsed Logical Plan == > SaveIntoDataSourceCommand jdbc, Map(dbtable -> test10, driver -> > org.postgresql.Driver, url -> jdbc:postgresql:sparkdb, password -> pass), > ErrorIfExists >+- Range (0, 100, step=1, splits=Some(8)) > == Analyzed Logical Plan == > SaveIntoDataSourceCommand jdbc, Map(dbtable -> test10, driver -> > org.postgresql.Driver, url -> jdbc:postgresql:sparkdb, password -> pass), > ErrorIfExists >+- Range (0, 100, step=1, splits=Some(8)) > == Optimized Logical Plan == > SaveIntoDataSourceCommand jdbc, Map(dbtable -> test10, driver -> > org.postgresql.Driver, url -> jdbc:postgresql:sparkdb, password -> pass), > ErrorIfExists >+- Range (0, 100, step=1, splits=Some(8)) > == Physical Plan == > ExecutedCommand >+- SaveIntoDataSourceCommand jdbc, Map(dbtable -> test10, driver -> > org.postgresql.Driver, url -> jdbc:postgresql:sparkdb, password -> pass), > ErrorIfExists > +- Range (0, 100, step=1, splits=Some(8)) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23567) spark.redaction.regex should not include user by default, docs not updated
[ https://issues.apache.org/jira/browse/SPARK-23567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383723#comment-16383723 ] Thomas Graves commented on SPARK-23567: --- I also question whether the url should be redacted by default, but I would want to look more at SPARK-22479 to understand what url was hidden since the Jira doesn't have an example. > spark.redaction.regex should not include user by default, docs not updated > -- > > Key: SPARK-23567 > URL: https://issues.apache.org/jira/browse/SPARK-23567 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.1 >Reporter: Thomas Graves >Priority: Major > > SPARK-22479 changed to redact the user name by default. I would argue > username isn't something that should be redacted by default and its very > useful for debugging and other things. If people are running super secure and > want to turn it on they can but I don't see the user name as a default > security setting. There are also other ways on the UI to see the user name, > for instance on yarn you can go to the Environment page and looking at the > resources and see the username in the paths. > Also the Jira did not update the default setting in the docs, so the docs are > out of date: > http://spark.apache.org/docs/2.2.1/configuration.html -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23567) spark.redaction.regex should not include user by default, docs not updated
Thomas Graves created SPARK-23567: - Summary: spark.redaction.regex should not include user by default, docs not updated Key: SPARK-23567 URL: https://issues.apache.org/jira/browse/SPARK-23567 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.2.1 Reporter: Thomas Graves SPARK-22479 changed to redact the user name by default. I would argue username isn't something that should be redacted by default and its very useful for debugging and other things. If people are running super secure and want to turn it on they can but I don't see the user name as a default security setting. There are also other ways on the UI to see the user name, for instance on yarn you can go to the Environment page and looking at the resources and see the username in the paths. Also the Jira did not update the default setting in the docs, so the docs are out of date: http://spark.apache.org/docs/2.2.1/configuration.html -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22479) SaveIntoDataSourceCommand logs jdbc credentials
[ https://issues.apache.org/jira/browse/SPARK-22479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383709#comment-16383709 ] Thomas Graves commented on SPARK-22479: --- [~aash] [~onursatici] this seems to have redacted user names as well as the passwords. We specifically added the User: field to the UI and now its being blocked, which is makes debugging harder. The user name does not seem like something that needs to be redacted by default. what is the reasoning behind that? Note that at least on yarn there are other ways to easily see the username on the UI (like the Resource Paths) so its definitely not a complete solution anyway. > SaveIntoDataSourceCommand logs jdbc credentials > --- > > Key: SPARK-22479 > URL: https://issues.apache.org/jira/browse/SPARK-22479 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Onur Satici >Assignee: Onur Satici >Priority: Major > Fix For: 2.2.1, 2.3.0 > > > JDBC credentials are not redacted in plans including a > 'SaveIntoDataSourceCommand'. > Steps to reproduce: > {code} > spark-shell --packages org.postgresql:postgresql:42.1.1 > {code} > {code} > import org.apache.spark.sql.execution.QueryExecution > import org.apache.spark.sql.util.QueryExecutionListener > val listener = new QueryExecutionListener { > override def onFailure(funcName: String, qe: QueryExecution, exception: > Exception): Unit = {} > override def onSuccess(funcName: String, qe: QueryExecution, duration: > Long): Unit = { > System.out.println(qe.toString()) > } > } > spark.listenerManager.register(listener) > spark.range(100).write.format("jdbc").option("url", > "jdbc:postgresql:sparkdb").option("password", "pass").option("driver", > "org.postgresql.Driver").option("dbtable", "test").save() > {code} > The above will yield the following plan: > {code} > == Parsed Logical Plan == > SaveIntoDataSourceCommand jdbc, Map(dbtable -> test10, driver -> > org.postgresql.Driver, url -> jdbc:postgresql:sparkdb, password -> pass), > ErrorIfExists >+- Range (0, 100, step=1, splits=Some(8)) > == Analyzed Logical Plan == > SaveIntoDataSourceCommand jdbc, Map(dbtable -> test10, driver -> > org.postgresql.Driver, url -> jdbc:postgresql:sparkdb, password -> pass), > ErrorIfExists >+- Range (0, 100, step=1, splits=Some(8)) > == Optimized Logical Plan == > SaveIntoDataSourceCommand jdbc, Map(dbtable -> test10, driver -> > org.postgresql.Driver, url -> jdbc:postgresql:sparkdb, password -> pass), > ErrorIfExists >+- Range (0, 100, step=1, splits=Some(8)) > == Physical Plan == > ExecutedCommand >+- SaveIntoDataSourceCommand jdbc, Map(dbtable -> test10, driver -> > org.postgresql.Driver, url -> jdbc:postgresql:sparkdb, password -> pass), > ErrorIfExists > +- Range (0, 100, step=1, splits=Some(8)) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (YARN-7935) Expose container's hostname to applications running within the docker container
[ https://issues.apache.org/jira/browse/YARN-7935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16374598#comment-16374598 ] Thomas Graves commented on YARN-7935: - thanks for the explanation Mridul. I'm fine with waiting on the spark Jira til you know the scope better, I'm currently not doing anything with bridge mode so won't be able to help there at this point. > Expose container's hostname to applications running within the docker > container > --- > > Key: YARN-7935 > URL: https://issues.apache.org/jira/browse/YARN-7935 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Suma Shivaprasad >Assignee: Suma Shivaprasad >Priority: Major > Attachments: YARN-7935.1.patch, YARN-7935.2.patch > > > Some applications have a need to bind to the container's hostname (like > Spark) which is different from the NodeManager's hostname(NM_HOST which is > available as an env during container launch) when launched through Docker > runtime. The container's hostname can be exposed to applications via an env > CONTAINER_HOSTNAME. Another potential candidate is the container's IP but > this can be addressed in a separate jira. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7935) Expose container's hostname to applications running within the docker container
[ https://issues.apache.org/jira/browse/YARN-7935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16373039#comment-16373039 ] Thomas Graves commented on YARN-7935: - [~mridulm80] what is the spark Jira for this? If this goes in it will still have to grab this from env to pass in to the executorRunnable. > Expose container's hostname to applications running within the docker > container > --- > > Key: YARN-7935 > URL: https://issues.apache.org/jira/browse/YARN-7935 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Suma Shivaprasad >Assignee: Suma Shivaprasad >Priority: Major > Attachments: YARN-7935.1.patch, YARN-7935.2.patch > > > Some applications have a need to bind to the container's hostname (like > Spark) which is different from the NodeManager's hostname(NM_HOST which is > available as an env during container launch) when launched through Docker > runtime. The container's hostname can be exposed to applications via an env > CONTAINER_HOSTNAME. Another potential candidate is the container's IP but > this can be addressed in a separate jira. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16357346#comment-16357346 ] Thomas Graves commented on SPARK-23309: --- sorry I haven't had time to make a query/dataset to reproduce that. I'm ok with this not being a blocker for 2.3. > Spark 2.3 cached query performance 20-30% worse then spark 2.2 > -- > > Key: SPARK-23309 > URL: https://issues.apache.org/jira/browse/SPARK-23309 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Priority: Blocker > > I was testing spark 2.3 rc2 and I am seeing a performance regression in sql > queries on cached data. > The size of the data: 10.4GB input from hive orc files /18.8 GB cached/5592 > partitions > Here is the example query: > val dailycached = spark.sql("select something from table where dt = > '20170301' AND something IS NOT NULL") > dailycached.createOrReplaceTempView("dailycached") > spark.catalog.cacheTable("dailyCached") > spark.sql("SELECT COUNT(DISTINCT(something)) from dailycached").show() > > On spark 2.2 I see queries times average 13 seconds > On the same nodes I see spark 2.3 queries times average 17 seconds > Note these are times of queries after the initial caching. so just running > the last line again: > spark.sql("SELECT COUNT(DISTINCT(something)) from dailycached").show() > multiple times. > > I also ran a query over more data (335GB input/587.5 GB cached) and saw a > similar discrepancy in the performance of querying cached data between spark > 2.3 and spark 2.2, where 2.2 was better by like 20%. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22683) DynamicAllocation wastes resources by allocating containers that will barely be used
[ https://issues.apache.org/jira/browse/SPARK-22683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356168#comment-16356168 ] Thomas Graves commented on SPARK-22683: --- I agree, I think default behavior stays 1. I ran a few tests with this patch. I definitely see an improvement in resource usage across all the jobs I ran. The jobs were similar job run time or actually faster on a few. I used default 60 second timeout. > DynamicAllocation wastes resources by allocating containers that will barely > be used > > > Key: SPARK-22683 > URL: https://issues.apache.org/jira/browse/SPARK-22683 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.1.0, 2.2.0 >Reporter: Julien Cuquemelle >Priority: Major > Labels: pull-request-available > > While migrating a series of jobs from MR to Spark using dynamicAllocation, > I've noticed almost a doubling (+114% exactly) of resource consumption of > Spark w.r.t MR, for a wall clock time gain of 43% > About the context: > - resource usage stands for vcore-hours allocation for the whole job, as seen > by YARN > - I'm talking about a series of jobs because we provide our users with a way > to define experiments (via UI / DSL) that automatically get translated to > Spark / MR jobs and submitted on the cluster > - we submit around 500 of such jobs each day > - these jobs are usually one shot, and the amount of processing can vary a > lot between jobs, and as such finding an efficient number of executors for > each job is difficult to get right, which is the reason I took the path of > dynamic allocation. > - Some of the tests have been scheduled on an idle queue, some on a full > queue. > - experiments have been conducted with spark.executor-cores = 5 and 10, only > results for 5 cores have been reported because efficiency was overall better > than with 10 cores > - the figures I give are averaged over a representative sample of those jobs > (about 600 jobs) ranging from tens to thousands splits in the data > partitioning and between 400 to 9000 seconds of wall clock time. > - executor idle timeout is set to 30s; > > Definition: > - let's say an executor has spark.executor.cores / spark.task.cpus taskSlots, > which represent the max number of tasks an executor will process in parallel. > - the current behaviour of the dynamic allocation is to allocate enough > containers to have one taskSlot per task, which minimizes latency, but wastes > resources when tasks are small regarding executor allocation and idling > overhead. > The results using the proposal (described below) over the job sample (600 > jobs): > - by using 2 tasks per taskSlot, we get a 5% (against -114%) reduction in > resource usage, for a 37% (against 43%) reduction in wall clock time for > Spark w.r.t MR > - by trying to minimize the average resource consumption, I ended up with 6 > tasks per core, with a 30% resource usage reduction, for a similar wall clock > time w.r.t. MR > What did I try to solve the issue with existing parameters (summing up a few > points mentioned in the comments) ? > - change dynamicAllocation.maxExecutors: this would need to be adapted for > each job (tens to thousands splits can occur), and essentially remove the > interest of using the dynamic allocation. > - use dynamicAllocation.backlogTimeout: > - setting this parameter right to avoid creating unused executors is very > dependant on wall clock time. One basically needs to solve the exponential > ramp up for the target time. So this is not an option for my use case where I > don't want a per-job tuning. > - I've still done a series of experiments, details in the comments. > Result is that after manual tuning, the best I could get was a similar > resource consumption at the expense of 20% more wall clock time, or a similar > wall clock time at the expense of 60% more resource consumption than what I > got using my proposal @ 6 tasks per slot (this value being optimized over a > much larger range of jobs as already stated) > - as mentioned in another comment, tampering with the exponential ramp up > might yield task imbalance and such old executors could become contention > points for other exes trying to remotely access blocks in the old exes (not > witnessed in the jobs I'm talking about, but we did see this behavior in > other jobs) > Proposal: > Simply add a tasksPerExecutorSlot parameter, which makes it possible to > specify how many tasks a single taskSlot should ideally execute to mitigate > the overhead of executor allocation. > PR: https://github.com/apache/spark/pull/19881 -- This message was sent by Atlassian JIRA (v7.6.3#76005) ---
[jira] [Comment Edited] (SPARK-22683) DynamicAllocation wastes resources by allocating containers that will barely be used
[ https://issues.apache.org/jira/browse/SPARK-22683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356168#comment-16356168 ] Thomas Graves edited comment on SPARK-22683 at 2/7/18 10:24 PM: I agree, I think default behavior stays 1. I ran a few tests with this patch. I definitely see an improvement in resource usage across all the jobs I ran. The jobs were similar job run time or actually faster on a few. I used default 60 second timeout. Note none of those jobs were really long running. small to medium size tasks. was (Author: tgraves): I agree, I think default behavior stays 1. I ran a few tests with this patch. I definitely see an improvement in resource usage across all the jobs I ran. The jobs were similar job run time or actually faster on a few. I used default 60 second timeout. > DynamicAllocation wastes resources by allocating containers that will barely > be used > > > Key: SPARK-22683 > URL: https://issues.apache.org/jira/browse/SPARK-22683 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.1.0, 2.2.0 >Reporter: Julien Cuquemelle >Priority: Major > Labels: pull-request-available > > While migrating a series of jobs from MR to Spark using dynamicAllocation, > I've noticed almost a doubling (+114% exactly) of resource consumption of > Spark w.r.t MR, for a wall clock time gain of 43% > About the context: > - resource usage stands for vcore-hours allocation for the whole job, as seen > by YARN > - I'm talking about a series of jobs because we provide our users with a way > to define experiments (via UI / DSL) that automatically get translated to > Spark / MR jobs and submitted on the cluster > - we submit around 500 of such jobs each day > - these jobs are usually one shot, and the amount of processing can vary a > lot between jobs, and as such finding an efficient number of executors for > each job is difficult to get right, which is the reason I took the path of > dynamic allocation. > - Some of the tests have been scheduled on an idle queue, some on a full > queue. > - experiments have been conducted with spark.executor-cores = 5 and 10, only > results for 5 cores have been reported because efficiency was overall better > than with 10 cores > - the figures I give are averaged over a representative sample of those jobs > (about 600 jobs) ranging from tens to thousands splits in the data > partitioning and between 400 to 9000 seconds of wall clock time. > - executor idle timeout is set to 30s; > > Definition: > - let's say an executor has spark.executor.cores / spark.task.cpus taskSlots, > which represent the max number of tasks an executor will process in parallel. > - the current behaviour of the dynamic allocation is to allocate enough > containers to have one taskSlot per task, which minimizes latency, but wastes > resources when tasks are small regarding executor allocation and idling > overhead. > The results using the proposal (described below) over the job sample (600 > jobs): > - by using 2 tasks per taskSlot, we get a 5% (against -114%) reduction in > resource usage, for a 37% (against 43%) reduction in wall clock time for > Spark w.r.t MR > - by trying to minimize the average resource consumption, I ended up with 6 > tasks per core, with a 30% resource usage reduction, for a similar wall clock > time w.r.t. MR > What did I try to solve the issue with existing parameters (summing up a few > points mentioned in the comments) ? > - change dynamicAllocation.maxExecutors: this would need to be adapted for > each job (tens to thousands splits can occur), and essentially remove the > interest of using the dynamic allocation. > - use dynamicAllocation.backlogTimeout: > - setting this parameter right to avoid creating unused executors is very > dependant on wall clock time. One basically needs to solve the exponential > ramp up for the target time. So this is not an option for my use case where I > don't want a per-job tuning. > - I've still done a series of experiments, details in the comments. > Result is that after manual tuning, the best I could get was a similar > resource consumption at the expense of 20% more wall clock time, or a similar > wall clock time at the expense of 60% more resource consumption than what I > got using my proposal @ 6 tasks per slot (this value being optimized over a > much larger range of jobs as already stated) > - as mentioned in another comment, tampering with the exponential ramp up > might yield task imbalance and such old executors could become contention > points for other exes trying to remotely access blocks in the old exes (not > witnessed in
[jira] [Commented] (SPARK-22683) DynamicAllocation wastes resources by allocating containers that will barely be used
[ https://issues.apache.org/jira/browse/SPARK-22683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355920#comment-16355920 ] Thomas Graves commented on SPARK-22683: --- If the config is set to 1 which keeps the current behavior the job server pattern and really any other application by default won't be affected. I don't see this as any different then me tuning max executors for example. Really this is just a more dynamic max executors. I agree with you that this isn't optimal in ways. For instances it applies it across the entire application where you could run multiple jobs and stages. Each of those might not want this config, but that is a different problem where we would need to support per stage configuration for example. If its a single application then you should be able to set this between jobs programmatically if they are serial jobs (although I haven't tested this), but if that doesn't work all the dynamic allocation configs would have the same issue. > DynamicAllocation wastes resources by allocating containers that will barely > be used > > > Key: SPARK-22683 > URL: https://issues.apache.org/jira/browse/SPARK-22683 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.1.0, 2.2.0 >Reporter: Julien Cuquemelle >Priority: Major > Labels: pull-request-available > > While migrating a series of jobs from MR to Spark using dynamicAllocation, > I've noticed almost a doubling (+114% exactly) of resource consumption of > Spark w.r.t MR, for a wall clock time gain of 43% > About the context: > - resource usage stands for vcore-hours allocation for the whole job, as seen > by YARN > - I'm talking about a series of jobs because we provide our users with a way > to define experiments (via UI / DSL) that automatically get translated to > Spark / MR jobs and submitted on the cluster > - we submit around 500 of such jobs each day > - these jobs are usually one shot, and the amount of processing can vary a > lot between jobs, and as such finding an efficient number of executors for > each job is difficult to get right, which is the reason I took the path of > dynamic allocation. > - Some of the tests have been scheduled on an idle queue, some on a full > queue. > - experiments have been conducted with spark.executor-cores = 5 and 10, only > results for 5 cores have been reported because efficiency was overall better > than with 10 cores > - the figures I give are averaged over a representative sample of those jobs > (about 600 jobs) ranging from tens to thousands splits in the data > partitioning and between 400 to 9000 seconds of wall clock time. > - executor idle timeout is set to 30s; > > Definition: > - let's say an executor has spark.executor.cores / spark.task.cpus taskSlots, > which represent the max number of tasks an executor will process in parallel. > - the current behaviour of the dynamic allocation is to allocate enough > containers to have one taskSlot per task, which minimizes latency, but wastes > resources when tasks are small regarding executor allocation and idling > overhead. > The results using the proposal (described below) over the job sample (600 > jobs): > - by using 2 tasks per taskSlot, we get a 5% (against -114%) reduction in > resource usage, for a 37% (against 43%) reduction in wall clock time for > Spark w.r.t MR > - by trying to minimize the average resource consumption, I ended up with 6 > tasks per core, with a 30% resource usage reduction, for a similar wall clock > time w.r.t. MR > What did I try to solve the issue with existing parameters (summing up a few > points mentioned in the comments) ? > - change dynamicAllocation.maxExecutors: this would need to be adapted for > each job (tens to thousands splits can occur), and essentially remove the > interest of using the dynamic allocation. > - use dynamicAllocation.backlogTimeout: > - setting this parameter right to avoid creating unused executors is very > dependant on wall clock time. One basically needs to solve the exponential > ramp up for the target time. So this is not an option for my use case where I > don't want a per-job tuning. > - I've still done a series of experiments, details in the comments. > Result is that after manual tuning, the best I could get was a similar > resource consumption at the expense of 20% more wall clock time, or a similar > wall clock time at the expense of 60% more resource consumption than what I > got using my proposal @ 6 tasks per slot (this value being optimized over a > much larger range of jobs as already stated) > - as mentioned in another comment, tampering with the exponential ramp up > might yield task im
[jira] [Commented] (SPARK-22683) DynamicAllocation wastes resources by allocating containers that will barely be used
[ https://issues.apache.org/jira/browse/SPARK-22683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1637#comment-1637 ] Thomas Graves commented on SPARK-22683: --- ok thanks, I would like to try this out myself on a few jobs, but my opinion is we should put this config in, if others have strong disagreement please speak up, otherwise I think we can move the discussion to the PR. I do think we need to change the name of the config. > DynamicAllocation wastes resources by allocating containers that will barely > be used > > > Key: SPARK-22683 > URL: https://issues.apache.org/jira/browse/SPARK-22683 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.1.0, 2.2.0 >Reporter: Julien Cuquemelle >Priority: Major > Labels: pull-request-available > > While migrating a series of jobs from MR to Spark using dynamicAllocation, > I've noticed almost a doubling (+114% exactly) of resource consumption of > Spark w.r.t MR, for a wall clock time gain of 43% > About the context: > - resource usage stands for vcore-hours allocation for the whole job, as seen > by YARN > - I'm talking about a series of jobs because we provide our users with a way > to define experiments (via UI / DSL) that automatically get translated to > Spark / MR jobs and submitted on the cluster > - we submit around 500 of such jobs each day > - these jobs are usually one shot, and the amount of processing can vary a > lot between jobs, and as such finding an efficient number of executors for > each job is difficult to get right, which is the reason I took the path of > dynamic allocation. > - Some of the tests have been scheduled on an idle queue, some on a full > queue. > - experiments have been conducted with spark.executor-cores = 5 and 10, only > results for 5 cores have been reported because efficiency was overall better > than with 10 cores > - the figures I give are averaged over a representative sample of those jobs > (about 600 jobs) ranging from tens to thousands splits in the data > partitioning and between 400 to 9000 seconds of wall clock time. > - executor idle timeout is set to 30s; > > Definition: > - let's say an executor has spark.executor.cores / spark.task.cpus taskSlots, > which represent the max number of tasks an executor will process in parallel. > - the current behaviour of the dynamic allocation is to allocate enough > containers to have one taskSlot per task, which minimizes latency, but wastes > resources when tasks are small regarding executor allocation and idling > overhead. > The results using the proposal (described below) over the job sample (600 > jobs): > - by using 2 tasks per taskSlot, we get a 5% (against -114%) reduction in > resource usage, for a 37% (against 43%) reduction in wall clock time for > Spark w.r.t MR > - by trying to minimize the average resource consumption, I ended up with 6 > tasks per core, with a 30% resource usage reduction, for a similar wall clock > time w.r.t. MR > What did I try to solve the issue with existing parameters (summing up a few > points mentioned in the comments) ? > - change dynamicAllocation.maxExecutors: this would need to be adapted for > each job (tens to thousands splits can occur), and essentially remove the > interest of using the dynamic allocation. > - use dynamicAllocation.backlogTimeout: > - setting this parameter right to avoid creating unused executors is very > dependant on wall clock time. One basically needs to solve the exponential > ramp up for the target time. So this is not an option for my use case where I > don't want a per-job tuning. > - I've still done a series of experiments, details in the comments. > Result is that after manual tuning, the best I could get was a similar > resource consumption at the expense of 20% more wall clock time, or a similar > wall clock time at the expense of 60% more resource consumption than what I > got using my proposal @ 6 tasks per slot (this value being optimized over a > much larger range of jobs as already stated) > - as mentioned in another comment, tampering with the exponential ramp up > might yield task imbalance and such old executors could become contention > points for other exes trying to remotely access blocks in the old exes (not > witnessed in the jobs I'm talking about, but we did see this behavior in > other jobs) > Proposal: > Simply add a tasksPerExecutorSlot parameter, which makes it possible to > specify how many tasks a single taskSlot should ideally execute to mitigate > the overhead of executor allocation. > PR: https://github.com/apache/spark/pull/19881 -- This message was sent by Atlassian JIRA (v7.6.3#76005) -
[jira] [Commented] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354336#comment-16354336 ] Thomas Graves commented on SPARK-23309: --- I pulled in that patch ([https://github.com/apache/spark/pull/20513]) and numbers got better but am still seeing 10% slower on 2.3. (this is down from 15%) This is using the configs: --conf spark.sql.orc.impl=hive --conf spark.sql.orc.filterPushdown=true --conf spark.sql.hive.convertMetastoreOrc=false --conf spark.sql.inMemoryColumnarStorage.enableVectorizedReader=false has anyone else reproduced this or is it only me seeing it? > Spark 2.3 cached query performance 20-30% worse then spark 2.2 > -- > > Key: SPARK-23309 > URL: https://issues.apache.org/jira/browse/SPARK-23309 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Priority: Blocker > > I was testing spark 2.3 rc2 and I am seeing a performance regression in sql > queries on cached data. > The size of the data: 10.4GB input from hive orc files /18.8 GB cached/5592 > partitions > Here is the example query: > val dailycached = spark.sql("select something from table where dt = > '20170301' AND something IS NOT NULL") > dailycached.createOrReplaceTempView("dailycached") > spark.catalog.cacheTable("dailyCached") > spark.sql("SELECT COUNT(DISTINCT(something)) from dailycached").show() > > On spark 2.2 I see queries times average 13 seconds > On the same nodes I see spark 2.3 queries times average 17 seconds > Note these are times of queries after the initial caching. so just running > the last line again: > spark.sql("SELECT COUNT(DISTINCT(something)) from dailycached").show() > multiple times. > > I also ran a query over more data (335GB input/587.5 GB cached) and saw a > similar discrepancy in the performance of querying cached data between spark > 2.3 and spark 2.2, where 2.2 was better by like 20%. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350901#comment-16350901 ] Thomas Graves edited comment on SPARK-23309 at 2/2/18 8:29 PM: --- I should ask is there a log statement or query plan I can dump out just to make sure spark.sql.inMemoryColumnarStorage.enableVectorizedReader=false was applied properly? Note I did verify the symbol CACHE_VECTORIZED_READER_ENABLED was present in the jar I ran with so the config should have been set properly. was (Author: tgraves): I should ask is there a log statement or query plan I can dump out just to make sure spark.sql.inMemoryColumnarStorage.enableVectorizedReader=false was applied properly? Note I did verify the symbol CACHE_VECTORIZED_READER_ENABLED was present in the jar I ran with so the config should have worked. > Spark 2.3 cached query performance 20-30% worse then spark 2.2 > -- > > Key: SPARK-23309 > URL: https://issues.apache.org/jira/browse/SPARK-23309 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Priority: Blocker > > I was testing spark 2.3 rc2 and I am seeing a performance regression in sql > queries on cached data. > The size of the data: 10.4GB input from hive orc files /18.8 GB cached/5592 > partitions > Here is the example query: > val dailycached = spark.sql("select something from table where dt = > '20170301' AND something IS NOT NULL") > dailycached.createOrReplaceTempView("dailycached") > spark.catalog.cacheTable("dailyCached") > spark.sql("SELECT COUNT(DISTINCT(something)) from dailycached").show() > > On spark 2.2 I see queries times average 13 seconds > On the same nodes I see spark 2.3 queries times average 17 seconds > Note these are times of queries after the initial caching. so just running > the last line again: > spark.sql("SELECT COUNT(DISTINCT(something)) from dailycached").show() > multiple times. > > I also ran a query over more data (335GB input/587.5 GB cached) and saw a > similar discrepancy in the performance of querying cached data between spark > 2.3 and spark 2.2, where 2.2 was better by like 20%. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350901#comment-16350901 ] Thomas Graves edited comment on SPARK-23309 at 2/2/18 8:29 PM: --- I should ask is there a log statement or query plan I can dump out just to make sure spark.sql.inMemoryColumnarStorage.enableVectorizedReader=false was applied properly? Note I did verify the symbol CACHE_VECTORIZED_READER_ENABLED was present in the jar I ran with so the config should have worked. was (Author: tgraves): I should ask is there a log statement or query plan I can dump out just to make sure spark.sql.inMemoryColumnarStorage.enableVectorizedReader=false was applied properly? > Spark 2.3 cached query performance 20-30% worse then spark 2.2 > -- > > Key: SPARK-23309 > URL: https://issues.apache.org/jira/browse/SPARK-23309 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Priority: Blocker > > I was testing spark 2.3 rc2 and I am seeing a performance regression in sql > queries on cached data. > The size of the data: 10.4GB input from hive orc files /18.8 GB cached/5592 > partitions > Here is the example query: > val dailycached = spark.sql("select something from table where dt = > '20170301' AND something IS NOT NULL") > dailycached.createOrReplaceTempView("dailycached") > spark.catalog.cacheTable("dailyCached") > spark.sql("SELECT COUNT(DISTINCT(something)) from dailycached").show() > > On spark 2.2 I see queries times average 13 seconds > On the same nodes I see spark 2.3 queries times average 17 seconds > Note these are times of queries after the initial caching. so just running > the last line again: > spark.sql("SELECT COUNT(DISTINCT(something)) from dailycached").show() > multiple times. > > I also ran a query over more data (335GB input/587.5 GB cached) and saw a > similar discrepancy in the performance of querying cached data between spark > 2.3 and spark 2.2, where 2.2 was better by like 20%. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350813#comment-16350813 ] Thomas Graves edited comment on SPARK-23309 at 2/2/18 8:15 PM: --- I'm still seeing spark 2.3 slower by about 15% for the larger dataset. (times here are 301 seconds on 2.2 vs 346 seconds on 2.3) I tried => --conf spark.sql.orc.impl=hive --conf spark.sql.orc.filterPushdown=true --conf spark.sql.hive.convertMetastoreOrc=false and then also tried setting the vectoried reader to false => --conf spark.sql.orc.impl=hive --conf spark.sql.orc.filterPushdown=true --conf spark.sql.hive.convertMetastoreOrc=false -conf spark.sql.inMemoryColumnarStorage.enableVectorizedReader=false Note the # of partitions its processing is now the same since turning off the native orc impl. was (Author: tgraves): I'm still seeing spark 2.3 slower by about 15% for the larger dataset. (times here are 301 seconds on 2.2 vs 346 seconds on 2.3) I tried => --conf spark.sql.orc.impl=hive --conf spark.sql.orc.filterPushdown=true --conf spark.sql.hive.convertMetastoreOrc=false and then also tried setting the vectoried reader to false => --conf spark.sql.orc.impl=hive --conf spark.sql.orc.filterPushdown=true --conf spark.sql.hive.convertMetastoreOrc=false- -conf spark.sql.inMemoryColumnarStorage.enableVectorizedReader=false Note the # of partitions its processing is now the same since turning off the native orc impl. > Spark 2.3 cached query performance 20-30% worse then spark 2.2 > -- > > Key: SPARK-23309 > URL: https://issues.apache.org/jira/browse/SPARK-23309 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Priority: Blocker > > I was testing spark 2.3 rc2 and I am seeing a performance regression in sql > queries on cached data. > The size of the data: 10.4GB input from hive orc files /18.8 GB cached/5592 > partitions > Here is the example query: > val dailycached = spark.sql("select something from table where dt = > '20170301' AND something IS NOT NULL") > dailycached.createOrReplaceTempView("dailycached") > spark.catalog.cacheTable("dailyCached") > spark.sql("SELECT COUNT(DISTINCT(something)) from dailycached").show() > > On spark 2.2 I see queries times average 13 seconds > On the same nodes I see spark 2.3 queries times average 17 seconds > Note these are times of queries after the initial caching. so just running > the last line again: > spark.sql("SELECT COUNT(DISTINCT(something)) from dailycached").show() > multiple times. > > I also ran a query over more data (335GB input/587.5 GB cached) and saw a > similar discrepancy in the performance of querying cached data between spark > 2.3 and spark 2.2, where 2.2 was better by like 20%. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350901#comment-16350901 ] Thomas Graves commented on SPARK-23309: --- I should ask is there a log statement or query plan I can dump out just to make sure spark.sql.inMemoryColumnarStorage.enableVectorizedReader=false was applied properly? > Spark 2.3 cached query performance 20-30% worse then spark 2.2 > -- > > Key: SPARK-23309 > URL: https://issues.apache.org/jira/browse/SPARK-23309 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Priority: Blocker > > I was testing spark 2.3 rc2 and I am seeing a performance regression in sql > queries on cached data. > The size of the data: 10.4GB input from hive orc files /18.8 GB cached/5592 > partitions > Here is the example query: > val dailycached = spark.sql("select something from table where dt = > '20170301' AND something IS NOT NULL") > dailycached.createOrReplaceTempView("dailycached") > spark.catalog.cacheTable("dailyCached") > spark.sql("SELECT COUNT(DISTINCT(something)) from dailycached").show() > > On spark 2.2 I see queries times average 13 seconds > On the same nodes I see spark 2.3 queries times average 17 seconds > Note these are times of queries after the initial caching. so just running > the last line again: > spark.sql("SELECT COUNT(DISTINCT(something)) from dailycached").show() > multiple times. > > I also ran a query over more data (335GB input/587.5 GB cached) and saw a > similar discrepancy in the performance of querying cached data between spark > 2.3 and spark 2.2, where 2.2 was better by like 20%. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350900#comment-16350900 ] Thomas Graves commented on SPARK-23309: --- So the last test I did was spark 2.3 with the old hive path and spark 2.2. Spark 2.3 is slower then spark 2.2 reading the cached data. [~smilegator] I already tried the patch, see the last config I tested with where -conf spark.sql.inMemoryColumnarStorage.enableVectorizedReader=false > Spark 2.3 cached query performance 20-30% worse then spark 2.2 > -- > > Key: SPARK-23309 > URL: https://issues.apache.org/jira/browse/SPARK-23309 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Priority: Blocker > > I was testing spark 2.3 rc2 and I am seeing a performance regression in sql > queries on cached data. > The size of the data: 10.4GB input from hive orc files /18.8 GB cached/5592 > partitions > Here is the example query: > val dailycached = spark.sql("select something from table where dt = > '20170301' AND something IS NOT NULL") > dailycached.createOrReplaceTempView("dailycached") > spark.catalog.cacheTable("dailyCached") > spark.sql("SELECT COUNT(DISTINCT(something)) from dailycached").show() > > On spark 2.2 I see queries times average 13 seconds > On the same nodes I see spark 2.3 queries times average 17 seconds > Note these are times of queries after the initial caching. so just running > the last line again: > spark.sql("SELECT COUNT(DISTINCT(something)) from dailycached").show() > multiple times. > > I also ran a query over more data (335GB input/587.5 GB cached) and saw a > similar discrepancy in the performance of querying cached data between spark > 2.3 and spark 2.2, where 2.2 was better by like 20%. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350813#comment-16350813 ] Thomas Graves edited comment on SPARK-23309 at 2/2/18 7:04 PM: --- I'm still seeing spark 2.3 slower by about 15% for the larger dataset. (times here are 301 seconds on 2.2 vs 346 seconds on 2.3) I tried => --conf spark.sql.orc.impl=hive --conf spark.sql.orc.filterPushdown=true --conf spark.sql.hive.convertMetastoreOrc=false and then also tried setting the vectoried reader to false => --conf spark.sql.orc.impl=hive --conf spark.sql.orc.filterPushdown=true --conf spark.sql.hive.convertMetastoreOrc=false- -conf spark.sql.inMemoryColumnarStorage.enableVectorizedReader=false Note the # of partitions its processing is now the same since turning off the native orc impl. was (Author: tgraves): I'm still seeing spark 2.3 slower by about 15% for the larger dataset. I tried => --conf spark.sql.orc.impl=hive --conf spark.sql.orc.filterPushdown=true --conf spark.sql.hive.convertMetastoreOrc=false and then also tried setting the vectoried reader to false => --conf spark.sql.orc.impl=hive --conf spark.sql.orc.filterPushdown=true --conf spark.sql.hive.convertMetastoreOrc=false --conf spark.sql.inMemoryColumnarStorage.enableVectorizedReader=false Note the # of partitions its processing is now the same since turning off the native orc impl. > Spark 2.3 cached query performance 20-30% worse then spark 2.2 > -- > > Key: SPARK-23309 > URL: https://issues.apache.org/jira/browse/SPARK-23309 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Priority: Blocker > > I was testing spark 2.3 rc2 and I am seeing a performance regression in sql > queries on cached data. > The size of the data: 10.4GB input from hive orc files /18.8 GB cached/5592 > partitions > Here is the example query: > val dailycached = spark.sql("select something from table where dt = > '20170301' AND something IS NOT NULL") > dailycached.createOrReplaceTempView("dailycached") > spark.catalog.cacheTable("dailyCached") > spark.sql("SELECT COUNT(DISTINCT(something)) from dailycached").show() > > On spark 2.2 I see queries times average 13 seconds > On the same nodes I see spark 2.3 queries times average 17 seconds > Note these are times of queries after the initial caching. so just running > the last line again: > spark.sql("SELECT COUNT(DISTINCT(something)) from dailycached").show() > multiple times. > > I also ran a query over more data (335GB input/587.5 GB cached) and saw a > similar discrepancy in the performance of querying cached data between spark > 2.3 and spark 2.2, where 2.2 was better by like 20%. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350813#comment-16350813 ] Thomas Graves commented on SPARK-23309: --- I'm still seeing spark 2.3 slower by about 15% for the larger dataset. I tried => --conf spark.sql.orc.impl=hive --conf spark.sql.orc.filterPushdown=true --conf spark.sql.hive.convertMetastoreOrc=false and then also tried setting the vectoried reader to false => --conf spark.sql.orc.impl=hive --conf spark.sql.orc.filterPushdown=true --conf spark.sql.hive.convertMetastoreOrc=false --conf spark.sql.inMemoryColumnarStorage.enableVectorizedReader=false Note the # of partitions its processing is now the same since turning off the native orc impl. > Spark 2.3 cached query performance 20-30% worse then spark 2.2 > -- > > Key: SPARK-23309 > URL: https://issues.apache.org/jira/browse/SPARK-23309 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Priority: Blocker > > I was testing spark 2.3 rc2 and I am seeing a performance regression in sql > queries on cached data. > The size of the data: 10.4GB input from hive orc files /18.8 GB cached/5592 > partitions > Here is the example query: > val dailycached = spark.sql("select something from table where dt = > '20170301' AND something IS NOT NULL") > dailycached.createOrReplaceTempView("dailycached") > spark.catalog.cacheTable("dailyCached") > spark.sql("SELECT COUNT(DISTINCT(something)) from dailycached").show() > > On spark 2.2 I see queries times average 13 seconds > On the same nodes I see spark 2.3 queries times average 17 seconds > Note these are times of queries after the initial caching. so just running > the last line again: > spark.sql("SELECT COUNT(DISTINCT(something)) from dailycached").show() > multiple times. > > I also ran a query over more data (335GB input/587.5 GB cached) and saw a > similar discrepancy in the performance of querying cached data between spark > 2.3 and spark 2.2, where 2.2 was better by like 20%. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350533#comment-16350533 ] Thomas Graves commented on SPARK-23309: --- Note the schema of "something" here is a "string". I'll try with the changes in SPARK-23312 and turn off the vectorized cache reader. I'm also running 2.3 with the configs --conf spark.sql.orc.impl=hive --conf spark.sql.orc.filterPushdown=true --conf spark.sql.hive.convertMetastoreOrc=false which should be the same as 2.2 and it gives me the same # of partitions > Spark 2.3 cached query performance 20-30% worse then spark 2.2 > -- > > Key: SPARK-23309 > URL: https://issues.apache.org/jira/browse/SPARK-23309 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Priority: Blocker > > I was testing spark 2.3 rc2 and I am seeing a performance regression in sql > queries on cached data. > The size of the data: 10.4GB input from hive orc files /18.8 GB cached/5592 > partitions > Here is the example query: > val dailycached = spark.sql("select something from table where dt = > '20170301' AND something IS NOT NULL") > dailycached.createOrReplaceTempView("dailycached") > spark.catalog.cacheTable("dailyCached") > spark.sql("SELECT COUNT(DISTINCT(something)) from dailycached").show() > > On spark 2.2 I see queries times average 13 seconds > On the same nodes I see spark 2.3 queries times average 17 seconds > Note these are times of queries after the initial caching. so just running > the last line again: > spark.sql("SELECT COUNT(DISTINCT(something)) from dailycached").show() > multiple times. > > I also ran a query over more data (335GB input/587.5 GB cached) and saw a > similar discrepancy in the performance of querying cached data between spark > 2.3 and spark 2.2, where 2.2 was better by like 20%. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-23304) Spark SQL coalesce() against hive not working
[ https://issues.apache.org/jira/browse/SPARK-23304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-23304. --- Resolution: Invalid > Spark SQL coalesce() against hive not working > - > > Key: SPARK-23304 > URL: https://issues.apache.org/jira/browse/SPARK-23304 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.1, 2.3.0 >Reporter: Thomas Graves >Assignee: Xiao Li >Priority: Major > Attachments: spark22_oldorc_explain.txt, spark23_oldorc_explain.txt, > spark23_oldorc_explain_convermetastoreorcfalse.txt > > > The query below seems to ignore the coalesce. This is running spark 2.2 or > spark 2.3 against hive, which is reading orc: > > Query: > spark.sql("SELECT COUNT(DISTINCT(something)) FROM sometable WHERE dt >= > '20170301' AND dt <= '20170331' AND something IS NOT > NULL").coalesce(16).show() > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23304) Spark SQL coalesce() against hive not working
[ https://issues.apache.org/jira/browse/SPARK-23304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350440#comment-16350440 ] Thomas Graves commented on SPARK-23304: --- ok so I guess by that logic then the coalesce won't every work with the COUNT(DISTINCT()) since its the intermediate query I want it to apply to, it will work on the select bcookie. I tested that and verified. spark.sql("SELECT something FROM sometable WHERE dt >= '20170301' AND dt <= '20170331' AND something IS NOT NULL").coalesce(8).show() Actually works then. So I guess we can close this it was my misunderstanding. > Spark SQL coalesce() against hive not working > - > > Key: SPARK-23304 > URL: https://issues.apache.org/jira/browse/SPARK-23304 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.1, 2.3.0 >Reporter: Thomas Graves >Assignee: Xiao Li >Priority: Major > Attachments: spark22_oldorc_explain.txt, spark23_oldorc_explain.txt, > spark23_oldorc_explain_convermetastoreorcfalse.txt > > > The query below seems to ignore the coalesce. This is running spark 2.2 or > spark 2.3 against hive, which is reading orc: > > Query: > spark.sql("SELECT COUNT(DISTINCT(something)) FROM sometable WHERE dt >= > '20170301' AND dt <= '20170331' AND something IS NOT > NULL").coalesce(16).show() > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23304) Spark SQL coalesce() against hive not working
[ https://issues.apache.org/jira/browse/SPARK-23304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350423#comment-16350423 ] Thomas Graves commented on SPARK-23304: --- it doesn't look like sql("xyz").rdd.partitions.length comes back correct in either spark 2.2 or 2.3. But if I change the query from SELECT COUNT(DISTINCT(bcookie)) . to just SELECT bookie, then the partitions.length works. So perhaps is something with the count spark 2.3 SELECT COUNT(DISTINCT(bcookie)) scala> query.rdd.partitions.length res4: Int = 1 scala> query.count() [Stage 5:===> (15420 + 619) / 16039] spark 2.2 SELECT COUNT(DISTINCT(bcookie)): scala> query.rdd.partitions.length res0: Int = 1 scala> query.count() [Stage 0:==> (1136 + 1600) / 5346] spark 2.2 Query with just select bcookie: scala> query.rdd.partitions.length res1: Int = 5346 spark 2.3 Query with just select bcookie: scala> query.rdd.partitions.length res9: Int = 16039 Note if I change to just be SELECT DISTINCT(bcookie) then I get 200: scala> query.rdd.partitions.length res10: Int = 200 > Spark SQL coalesce() against hive not working > - > > Key: SPARK-23304 > URL: https://issues.apache.org/jira/browse/SPARK-23304 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.1, 2.3.0 >Reporter: Thomas Graves >Assignee: Xiao Li >Priority: Major > Attachments: spark22_oldorc_explain.txt, spark23_oldorc_explain.txt, > spark23_oldorc_explain_convermetastoreorcfalse.txt > > > The query below seems to ignore the coalesce. This is running spark 2.2 or > spark 2.3 against hive, which is reading orc: > > Query: > spark.sql("SELECT COUNT(DISTINCT(something)) FROM sometable WHERE dt >= > '20170301' AND dt <= '20170331' AND something IS NOT > NULL").coalesce(16).show() > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23304) Spark SQL coalesce() against hive not working
[ https://issues.apache.org/jira/browse/SPARK-23304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350428#comment-16350428 ] Thomas Graves commented on SPARK-23304: --- well I guess that give you end # of partitions and not the # it will be initially reading > Spark SQL coalesce() against hive not working > - > > Key: SPARK-23304 > URL: https://issues.apache.org/jira/browse/SPARK-23304 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.1, 2.3.0 >Reporter: Thomas Graves >Assignee: Xiao Li >Priority: Major > Attachments: spark22_oldorc_explain.txt, spark23_oldorc_explain.txt, > spark23_oldorc_explain_convermetastoreorcfalse.txt > > > The query below seems to ignore the coalesce. This is running spark 2.2 or > spark 2.3 against hive, which is reading orc: > > Query: > spark.sql("SELECT COUNT(DISTINCT(something)) FROM sometable WHERE dt >= > '20170301' AND dt <= '20170331' AND something IS NOT > NULL").coalesce(16).show() > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23304) Spark SQL coalesce() against hive not working
[ https://issues.apache.org/jira/browse/SPARK-23304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349555#comment-16349555 ] Thomas Graves commented on SPARK-23304: --- I don't have any hive tables backed by parquet to compare to. > Spark SQL coalesce() against hive not working > - > > Key: SPARK-23304 > URL: https://issues.apache.org/jira/browse/SPARK-23304 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.1, 2.3.0 >Reporter: Thomas Graves >Assignee: Xiao Li >Priority: Major > Attachments: spark22_oldorc_explain.txt, spark23_oldorc_explain.txt, > spark23_oldorc_explain_convermetastoreorcfalse.txt > > > The query below seems to ignore the coalesce. This is running spark 2.2 or > spark 2.3 against hive, which is reading orc: > > Query: > spark.sql("SELECT COUNT(DISTINCT(something)) FROM sometable WHERE dt >= > '20170301' AND dt <= '20170331' AND something IS NOT > NULL").coalesce(16).show() > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349553#comment-16349553 ] Thomas Graves commented on SPARK-23309: --- [~dongjoon] is there any native way with the native hive to control the # of partitions? (like hive.exec.orc.split.strategy). Or do you have to do the coalesce? > Spark 2.3 cached query performance 20-30% worse then spark 2.2 > -- > > Key: SPARK-23309 > URL: https://issues.apache.org/jira/browse/SPARK-23309 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Priority: Blocker > > I was testing spark 2.3 rc2 and I am seeing a performance regression in sql > queries on cached data. > The size of the data: 10.4GB input from hive orc files /18.8 GB cached/5592 > partitions > Here is the example query: > val dailycached = spark.sql("select something from table where dt = > '20170301' AND something IS NOT NULL") > dailycached.createOrReplaceTempView("dailycached") > spark.catalog.cacheTable("dailyCached") > spark.sql("SELECT COUNT(DISTINCT(something)) from dailycached").show() > > On spark 2.2 I see queries times average 13 seconds > On the same nodes I see spark 2.3 queries times average 17 seconds > Note these are times of queries after the initial caching. so just running > the last line again: > spark.sql("SELECT COUNT(DISTINCT(something)) from dailycached").show() > multiple times. > > I also ran a query over more data (335GB input/587.5 GB cached) and saw a > similar discrepancy in the performance of querying cached data between spark > 2.3 and spark 2.2, where 2.2 was better by like 20%. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349551#comment-16349551 ] Thomas Graves commented on SPARK-23309: --- seeing the same time difference after adding in the spark.table("dailyCached").count() [~dongjoon] Correct this is only when read from cached data. Without caching spark 2.3 is quite a bit faster (1.5-2x+) then spark 2.2 when reading from hive using orc. (which is awesome, thanks for all the work!) I'm running now with --conf spark.sql.orc.impl=hive --conf spark.sql.hive.convertMetastoreOrc=false. For the smaller data set it did get closer, only 1 second diff on average between spark 2.2 and spark 2.3. Trying to run on the larger dataset now. I'm wondering if much of the difference is the larger # of partitions you get with hive native in spark 2.3 > Spark 2.3 cached query performance 20-30% worse then spark 2.2 > -- > > Key: SPARK-23309 > URL: https://issues.apache.org/jira/browse/SPARK-23309 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Priority: Blocker > > I was testing spark 2.3 rc2 and I am seeing a performance regression in sql > queries on cached data. > The size of the data: 10.4GB input from hive orc files /18.8 GB cached/5592 > partitions > Here is the example query: > val dailycached = spark.sql("select something from table where dt = > '20170301' AND something IS NOT NULL") > dailycached.createOrReplaceTempView("dailycached") > spark.catalog.cacheTable("dailyCached") > spark.sql("SELECT COUNT(DISTINCT(something)) from dailycached").show() > > On spark 2.2 I see queries times average 13 seconds > On the same nodes I see spark 2.3 queries times average 17 seconds > Note these are times of queries after the initial caching. so just running > the last line again: > spark.sql("SELECT COUNT(DISTINCT(something)) from dailycached").show() > multiple times. > > I also ran a query over more data (335GB input/587.5 GB cached) and saw a > similar discrepancy in the performance of querying cached data between spark > 2.3 and spark 2.2, where 2.2 was better by like 20%. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349483#comment-16349483 ] Thomas Graves commented on SPARK-23309: --- sure, I can also run with the --conf spark.sql.orc.impl=hive --conf spark.sql.orc.filterPushdown=false --conf spark.sql.hive.convertMetastoreOrc=false configs to make sure it doesn't just have to do with # of partitions > Spark 2.3 cached query performance 20-30% worse then spark 2.2 > -- > > Key: SPARK-23309 > URL: https://issues.apache.org/jira/browse/SPARK-23309 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Priority: Blocker > > I was testing spark 2.3 rc2 and I am seeing a performance regression in sql > queries on cached data. > The size of the data: 10.4GB input from hive orc files /18.8 GB cached/5592 > partitions > Here is the example query: > val dailycached = spark.sql("select something from table where dt = > '20170301' AND something IS NOT NULL") > dailycached.createOrReplaceTempView("dailycached") > spark.catalog.cacheTable("dailyCached") > spark.sql("SELECT COUNT(DISTINCT(something)) from dailycached").show() > > On spark 2.2 I see queries times average 13 seconds > On the same nodes I see spark 2.3 queries times average 17 seconds > Note these are times of queries after the initial caching. so just running > the last line again: > spark.sql("SELECT COUNT(DISTINCT(something)) from dailycached").show() > multiple times. > > I also ran a query over more data (335GB input/587.5 GB cached) and saw a > similar discrepancy in the performance of querying cached data between spark > 2.3 and spark 2.2, where 2.2 was better by like 20%. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23304) Spark SQL coalesce() against hive not working
[ https://issues.apache.org/jira/browse/SPARK-23304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349480#comment-16349480 ] Thomas Graves commented on SPARK-23304: --- I just ran the query (show()) and saw the # of partitions. spark23_oldorc_explain_convermetastoreorcfalse.txt is the explain with --conf spark.sql.orc.impl=hive --conf spark.sql.orc.filterPushdown=false --conf spark.sql.hive.convertMetastoreOrc=false > Spark SQL coalesce() against hive not working > - > > Key: SPARK-23304 > URL: https://issues.apache.org/jira/browse/SPARK-23304 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Assignee: Xiao Li >Priority: Major > Attachments: spark22_oldorc_explain.txt, spark23_oldorc_explain.txt, > spark23_oldorc_explain_convermetastoreorcfalse.txt > > > The query below seems to ignore the coalesce. This is running spark 2.2 or > spark 2.3 against hive, which is reading orc: > > Query: > spark.sql("SELECT COUNT(DISTINCT(something)) FROM sometable WHERE dt >= > '20170301' AND dt <= '20170331' AND something IS NOT > NULL").coalesce(16).show() > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org