You are right. There are too much tasks was created. How can we reduce the number of tasks?
On Thu, Mar 29, 2018, 7:44 AM Lalwani, Jayesh <jayesh.lalw...@capitalone.com> wrote: > Without knowing too many details, I can only guess. It could be that Spark > is creating a lot of tasks even though there are less records. Creation and > distribution of tasks has a noticeable overhead on smaller datasets. > > > > You might want to look at the driver logs, or the Spark Application Detail > UI. > > > > *From: *Tin Vu <tvu...@ucr.edu> > *Date: *Wednesday, March 28, 2018 at 8:04 PM > *To: *"user@spark.apache.org" <user@spark.apache.org> > *Subject: *[SparkSQL] SparkSQL performance on small TPCDS tables is very > low when compared to Drill or Presto > > > > Hi, > > > > I am executing a benchmark to compare performance of SparkSQL, Apache > Drill and Presto. My experimental setup: > > · TPCDS dataset with scale factor 100 (size 100GB). > > · Spark, Drill, Presto have a same number of workers: 12. > > · Each worked has same allocated amount of memory: 4GB. > > · Data is stored by Hive with ORC format. > > I executed a very simple SQL query: "SELECT * from table_name" > The issue is that for some small size tables (even table with few dozen of > records), SparkSQL still required about 7-8 seconds to finish, while Drill > and Presto only needed less than 1 second. > For other large tables with billions records, SparkSQL performance was > reasonable when it required 20-30 seconds to scan the whole table. > Do you have any idea or reasonable explanation for this issue? > > Thanks, > > > > ------------------------------ > > The information contained in this e-mail is confidential and/or > proprietary to Capital One and/or its affiliates and may only be used > solely in performance of work or services for Capital One. The information > transmitted herewith is intended only for use by the individual or entity > to which it is addressed. If the reader of this message is not the intended > recipient, you are hereby notified that any review, retransmission, > dissemination, distribution, copying or other use of, or taking of any > action in reliance upon this information is strictly prohibited. If you > have received this communication in error, please contact the sender and > delete the material from your computer. >