Where is yarn-shuffle.jar in maven?

2016-12-12 Thread Neal Yin
Hi, For dynamic allocation feature, I need spark-xxx-yarn-shuffle.jar. In my local spark build, I can see it. But in maven central, I can't find it. My build script pulls all jars from maven central. The only option is to check in this jar into git? Thanks, -Neal

Re: HiveContext creation failed with Kerberos

2015-12-09 Thread Neal Yin
ate: Tuesday, December 8, 2015 at 4:09 AM To: "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Re: HiveContext creation failed with Kerberos On 8 Dec 2015, at 06:52, Neal Yin <neal@workday.com<

HiveContext creation failed with Kerberos

2015-12-07 Thread Neal Yin
Hi I am using Spark 1.5.1 with CDH 5.4.2. My cluster is kerberos protected. Here is pseudocode for what I am trying to do. ugi = UserGroupInformation.loginUserFromKeytabAndReturnUGI(“foo", “…") ugi.doAs( new PrivilegedExceptionAction() { val sparkConf: SparkConf = createSparkConf(…)

How to build spark with Hive 1.x ?

2015-06-10 Thread Neal Yin
I am trying to build spark 1.3 branch with Hive 1.1.0. mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver -Phive-0.13.1 -Dhive.version=1.1.0 -Dhive.version.short=1.1.0 -DskipTests clean package I got following error Failed to execute goal on project spark-hive_2.10:

dataframe call, how to control number of tasks for a stage

2015-04-16 Thread Neal Yin
I have some trouble to control number of spark tasks for a stage. This on latest spark 1.3.x source code build. val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) sc.getConf.get(spark.default.parallelism) - setup to 10 val t1 = hiveContext.sql(FROM SalesJan2009 select * ) val t2 =

Re: Spark Yarn-client Kerberos on remote cluster

2015-04-14 Thread Neal Yin
If your localhost can¹t talk to a KDC, you can¹t access a kerberized cluster. Only key tab file is not enough. -Neal On 4/14/15, 3:54 AM, philippe L lanckvrind.p@gmail.com wrote: Dear All, I would like to know if its possible to configure the SparkConf() in order to interact with a remote

Re: Running Spark on Gateway - Connecting to Resource Manager Retries

2015-04-14 Thread Neal Yin
Your Yarn access is not configured. 0.0.0.0:8032http://0.0.0.0:8032 this is default yarn address. I guess you don't have yarn-site.xml in your classpath. -Neal From: Vineet Mishra clearmido...@gmail.commailto:clearmido...@gmail.com Date: Tuesday, April 14, 2015 at 12:05 AM To:

data frame API, change groupBy result column name

2015-03-30 Thread Neal Yin
I ran a line like following: tb2.groupBy(city, state).avg(price).show I got result: city state AVG(price) Charlestown New South Wales 1200.0 Newton ... MA 1200.0 Coral Gables ... FL 1200.0 Castricum