Github user MikhailErofeev commented on the issue:
https://github.com/apache/spark/pull/19978
@srowen, yes, the processing is no longer IO-bound after backporting
SPARK-20923
---
-
To unsubscribe, e-mail: reviews
Github user MikhailErofeev closed the pull request at:
https://github.com/apache/spark/pull/19978
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user MikhailErofeev commented on the issue:
https://github.com/apache/spark/pull/19978
@squito
Your guess was right, and I can remove these blocks by
https://issues.apache.org/jira/browse/SPARK-20923. I will test the performance
after this patch and refine or close
Github user MikhailErofeev commented on the issue:
https://github.com/apache/spark/pull/19978
Thanks for the constuctive feedback.
Here is my benchmark for a step of 1MB. During this run the speedup was
23%, I think there was some interference on my workstation.
```
2048
Github user MikhailErofeev commented on the issue:
https://github.com/apache/spark/pull/19978
I don't mind to just set it to a higher value. Moreover, the current
default value (2048) is small in any case.
For my log files, 30M buffer was the best value (a bigger one did
GitHub user MikhailErofeev opened a pull request:
https://github.com/apache/spark/pull/19978
[SPARK-22784][CORE] Configure reading buffer size in Spark History Server
## What changes were proposed in this pull request?
Added debug logging of spent time and line size for each job