data locality in spark

2015-04-27 Thread Grandl Robert
Hi guys, I am running some SQL queries, but all my tasks are reported as either NODE_LOCAL or PROCESS_LOCAL.  In case of Hadoop world, the reduce tasks are RACK or NON_RACK LOCAL because they have to aggregate data from multiple hosts. However, in Spark even the aggregation stages are reported

Re: counters in spark

2015-04-13 Thread Grandl Robert
Guys, Do you have any thoughts on this ? Thanks,Robert On Sunday, April 12, 2015 5:35 PM, Grandl Robert rgra...@yahoo.com.INVALID wrote: Hi guys, I was trying to figure out some counters in Spark, related to the amount of CPU or Memory used (in some metric), used by a task/stage

counters in spark

2015-04-12 Thread Grandl Robert
Hi guys, I was trying to figure out some counters in Spark, related to the amount of CPU or Memory used (in some metric), used by a task/stage/job, but I could not find any.  Is there any such counter available ? Thank you,Robert

question regarding the dependency DAG in Spark

2015-03-16 Thread Grandl Robert
Hi guys, I am trying to get a better understanding of the DAG generation for a job in Spark. Ideally, what I want is to run some SQL query and extract the generated DAG by Spark. By DAG I mean the stages and dependencies among stages, and the number of tasks in every stage. Could you guys

run spark standalone mode

2015-03-12 Thread Grandl Robert
Hi guys, I have a stupid question, but I am not sure how to get out of it.  I deployed spark 1.2.1 on a cluster of 30 nodes. Looking at master:8088 I can see all the workers I have created so far. (I start the cluster with sbin/start-all.sh) However, when running a Spark SQL query or even

Re: run spark standalone mode

2015-03-12 Thread Grandl Robert
Sorry guys for this. It seems that I need to start the thrift server with --master spark://ms0220:7077 option and now I can see applications running in my web UI. Thanks,Robert On Thursday, March 12, 2015 10:57 AM, Grandl Robert rgra...@yahoo.com.INVALID wrote: I figured out

Re: run spark standalone mode

2015-03-12 Thread Grandl Robert
I figured out for spark-shell by passing the --master option. However, still troubleshooting for launching sql queries. My current command is like that: ./bin/beeline -u jdbc:hive2://ms0220:1 -n `whoami` -p ignored -f tpch_query10.sql On Thursday, March 12, 2015 10:37 AM, Grandl

Spark SQL using Hive metastore

2015-03-11 Thread Grandl Robert
Hi guys, I am a newbie in running Spark SQL / Spark. My goal is to run some TPC-H queries atop Spark SQL using Hive metastore.  It looks like spark 1.2.1 release has Spark SQL / Hive support. However, I am not able to fully connect all the dots. I did the following: 1. Copied hive-site.xml

shark queries failed

2015-02-15 Thread Grandl Robert
Hi guys, I deployed BlinkDB(built atop Shark) and running Spark 0.9.  I tried to run several TPCDS shark queries taken from https://github.com/cloudera/impala-tpcds-kit/tree/master/queries-sql92-modified/queries/shark. However, the following exceptions are encountered. Do you have any idea why

Re: shark queries failed

2015-02-15 Thread Grandl Robert
, 2015 9:18 AM, Akhil Das ak...@sigmoidanalytics.com wrote: I'd suggest you updating your spark to the latest version and try SparkSQL instead of Shark. ThanksBest Regards On Sun, Feb 15, 2015 at 7:36 AM, Grandl Robert rgra...@yahoo.com.invalid wrote: Hi guys, I deployed BlinkDB(built atop

Spark standalone and HDFS 2.6

2015-02-13 Thread Grandl Robert
Hi guys, Probably a dummy question. Do you know how to compile Spark 0.9 to easily integrate with HDFS 2.6.0 ? I was trying sbt/sbt -Pyarn -Phadoop-2.6 assembly  ormvn -Dhadoop.version=2.6.0 -DskipTests clean package but none of these approaches succeeded. Thanks,Robert

Re: Spark standalone and HDFS 2.6

2015-02-13 Thread Grandl Robert
, Grandl Robert rgra...@yahoo.com wrote: Thanks Sean for your prompt response. I was trying to compile as following: mvn -Phadoop-2.4 -Dhadoop.version=2.6.0 -DskipTests clean package but I got a bunch of errors(see below). Hadoop-2.6.0 compiled correctly, and all hadoop jars are in .m2

Re: Spark standalone and HDFS 2.6

2015-02-13 Thread Grandl Robert
standalone mode, you don't need -Pyarn. There is no -Phadoop-2.6; you should use -Phadoop-2.4 for 2.4+. Yes, set -Dhadoop.version=2.6.0. That should be it. If that still doesn't work, define doesn't succeed. On Fri, Feb 13, 2015 at 7:13 PM, Grandl Robert rgra...@yahoo.com.invalid wrote: Hi guys