Re: SparkSQL: Freezing while running TPC-H query 5

2014-09-24 Thread Samay
Hey Dan,

Thanks for your reply. I have a couple of questions.

1) Were you able to verify that this is because of GC? If yes, then could
you let me know how.

2) If this is GC, then do you know of any tuning I can do to reduce this GC
pause?

Regards,
Samay

On Tue, Sep 23, 2014 at 11:15 PM, Dan Dietterich [via Apache Spark User
List] ml-node+s1001560n1492...@n3.nabble.com wrote:

 I have been seeing the same behavior when running large queries. My
 current theory is that the pauses are related to Java garbage collection.

 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Freezing-while-running-TPC-H-query-5-tp14902p14921.html
  To unsubscribe from SparkSQL: Freezing while running TPC-H query 5, click
 here
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=14902code=c21pbGluZ3NhbWF5QGdtYWlsLmNvbXwxNDkwMnwtMTQxODI1MDUwMw==
 .
 NAML
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Freezing-while-running-TPC-H-query-5-tp14902p14985.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

SparkSQL: Freezing while running TPC-H query 5

2014-09-23 Thread Samay
Hi,

I am trying to run TPC-H queries with SparkSQL 1.1.0 CLI with 1 r3.4xlarge
master + 20 r3.4xlarge slave machines on EC2 (each machine has 16vCPUs,
122GB memory). The TPC-H scale factor I am using is 1000 (i.e. 1000GB of
total data). 

When I try to run TPC-H query 5, the query hangs for a long time mid-query.
I've increased several timeouts to large values like 600seconds, in order to
prevent block manager and connection ACK timeouts. I see that the CPU is
being used even during the long pauses. (Not one core, but several cores),

Query:
select
n_name, sum(l_extendedprice * (1 - l_discount)) as revenue
from
customer c join
( select n_name, l_extendedprice, l_discount, s_nationkey, o_custkey from
orders o join
( select n_name, l_extendedprice, l_discount, l_orderkey, s_nationkey from
lineitem l join
( select n_name, s_suppkey, s_nationkey from supplier s join
( select n_name, n_nationkey
from nation n join region r
on n.n_regionkey = r.r_regionkey and r.r_name = 'ASIA'
) n1 on s.s_nationkey = n1.n_nationkey
) s1 on l.l_suppkey = s1.s_suppkey
) l1 on l1.l_orderkey = o.o_orderkey and o.o_orderdate = '1994-01-01'
and o.o_orderdate  '1995-01-01'
) o1
on c.c_nationkey = o1.s_nationkey and c.c_custkey = o1.o_custkey
group by n_name
order by revenue desc;

Below is the excerpt of the error on the worker node log after timeout.

14/09/23 14:21:25 INFO
storage.BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight:
50331648, targetRequestSize: 10066329
14/09/23 14:21:25 INFO
storage.BlockFetcherIterator$BasicBlockFetcherIterator: Getting 5 non-empty
blocks out of 320 blocks
14/09/23 14:21:25 INFO
storage.BlockFetcherIterator$BasicBlockFetcherIterator: Started 5 remote
fetches in 1 ms 
14/09/23 14:32:12 WARN executor.Executor: Told to re-register on heartbeat
14/09/23 14:32:50 INFO storage.BlockManager: BlockManager re-registering
with master
14/09/23 14:32:50 INFO storage.BlockManagerMaster: Trying to register
BlockManager
14/09/23 14:32:50 INFO storage.BlockManagerMaster: Registered BlockManager
14/09/23 14:32:50 WARN network.ConnectionManager: Could not find reference
for received ack Message 338974
14/09/23 14:32:50 INFO storage.BlockManager: Reporting 507 blocks to the
master. 
14/09/23 14:32:50 ERROR
storage.BlockFetcherIterator$BasicBlockFetcherIterator: Could not get
block(s) from ConnectionManagerId(ip-10-45-47-24.ec2.internal,49905)
java.io.IOException: sendMessageReliably failed because ack was not received
within 600 sec 
at
org.apache.spark.network.ConnectionManager$$anon$5$$anonfun$run$15.apply(ConnectionManager.scala:854)
at
org.apache.spark.network.ConnectionManager$$anon$5$$anonfun$run$15.apply(ConnectionManager.scala:852)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.network.ConnectionManager$$anon$5.run(ConnectionManager.scala:852)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)
14/09/23 14:33:06 ERROR
storage.BlockFetcherIterator$BasicBlockFetcherIterator: Could not get
block(s) from ConnectionManagerId(ip-10-239-184-234.ec2.internal,50538)
java.io.IOException: sendMessageReliably failed because ack was not received
within 600 sec 
at
org.apache.spark.network.ConnectionManager$$anon$5$$anonfun$run$15.apply(ConnectionManager.scala:854)
at
org.apache.spark.network.ConnectionManager$$anon$5$$anonfun$run$15.apply(ConnectionManager.scala:852)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.network.ConnectionManager$$anon$5.run(ConnectionManager.scala:852)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)

I have also attached a file listing the configuration parameters I am using.

Anybody have any ideas why there is such a big pause? Also, is there any
parameters I can tune to reduce this pause?

I am seeing similar behaviour on several other queries where there are long
pauses of 200-300s before the query starts making progress on the master.
Some of the queries complete while the others do not. Any help would be
appreciated.

Regards,
Samay

spark-defaults.conf
http://apache-spark-user-list.1001560.n3.nabble.com/file/n14902/spark-defaults.conf
  



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Freezing-while-running-TPC-H-query-5-tp14902.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



SparkSQL TPC-H query 3 joining multiple tables

2014-09-03 Thread Samay
Hi,

I am trying to run query 3 from the TPC-H benchmark using SparkSQL. But, I
am running into errors which I believe are because the parser does not
accept the JOIN syntax I am trying.

Below are the syntax which I tried and the error messages I am seeing .

Exception in thread main java.lang.RuntimeException: [1.159] failure:
``UNION'' expected but `join' found

SELECT l_orderkey, sum(l_extendedprice * (1 - l_discount)) as revenue,
o_orderdate, o_shippriority FROM customer c join orders o on c.c_custkey =
o.o_custkey join lineitem l on l.l_orderkey = o.o_orderkey WHERE
c_mktsegment = 'BUILDING' AND o_orderdate  '1995-03-15' AND l_shipdate 
'1995-03-15' GROUP BY l_orderkey, o_orderdate, o_shippriority ORDER BY
revenue desc, o_orderdate LIMIT 10;

Exception in thread main java.lang.RuntimeException: [1.125] failure:
``UNION'' expected but `,' found

SELECT l_orderkey, sum(l_extendedprice * (1 - l_discount)) as revenue,
o_orderdate, o_shippriority FROM customer c, orders o, lineitem l WHERE
l.l_orderkey = o.o_orderkey AND c.c_custkey = o.o_custkey AND c_mktsegment =
'BUILDING' AND o_orderdate  '1995-03-15' AND l_shipdate  '1995-03-15'
GROUP BY l_orderkey, o_orderdate, o_shippriority ORDER BY revenue desc,
o_orderdate LIMIT 10;

The same syntax works when I join 2 tables (TPC-H query 12 for instance).
Any ideas as to what the issue is?

Thanks in advance,
Samay



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-TPC-H-query-3-joining-multiple-tables-tp13344.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org