RE: spark slave cannot execute without admin permission on windows

2015-02-19 Thread Judy Nash
+ dev mailing list If this is supposed to work, is there a regression then? The spark core code shows the permission for copied file to \work is set to a+x at Line 442 of

Re: Replacing Jetty with TomCat

2015-02-19 Thread Niranda Perera
Hi Sean, The issue we have here is that all our products are based on a single platform and we try to make all our products coherent with our platform as much as possible. so, having two web services in one instance would not be a very elegant solution. That is why we were seeking a way to switch

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

2015-02-19 Thread Michael Armbrust
P.S: For some reason replacing import sqlContext.createSchemaRDD with import sqlContext.implicits._ doesn't do the implicit conversations. registerTempTable gives syntax error. I will dig deeper tomorrow. Has anyone seen this ? We will write up a whole migration guide before the final

Re: Hive SKEWED feature supported in Spark SQL ?

2015-02-19 Thread Michael Armbrust
1) is SKEWED BY honored ? If so, has anyone run into directories not being created ? It is not. 2) if it is not honored, does it matter ? Hive introduced this feature to better handle joins where tables had a skewed distribution on keys joined on so that the single mapper handling one of

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

2015-02-19 Thread Krishna Sankar
Excellent. Explicit toDF() works. a) employees.toDF().registerTempTable(Employees) - works b) Also affects saveAsParquetFile - orders.toDF().saveAsParquetFile Adding to my earlier tests: 4.0 SQL from Scala and Python 4.1 result = sqlContext.sql(SELECT * from Employees WHERE State = 'WA') OK 4.2

Re: Replacing Jetty with TomCat

2015-02-19 Thread Ewan Higgs
To add to Sean and Reynold's point: Please correct me if I'm wrong, but Spark depends on hadoop-common which also uses jetty in the HttpServer2 code. So even if you remove jetty from Spark by making it an optional dependency, it will be pulled in by Hadoop. So you'll still see that your

Hive SKEWED feature supported in Spark SQL ?

2015-02-19 Thread The Watcher
I have done some testing of inserting into tables defined in Hive using 1.2 and I can see that the PARTITION clause is honored : data files get created in multiple subdirectories correctly. I tried the SKEWED BY ON STORED AS DIRECTORIES clause on the CREATE TABLE clause but I didn't see

Re: Replacing Jetty with TomCat

2015-02-19 Thread Sean Owen
Sure, but you are not using Netty at all. It's invisible to you. It's not as if you have to set up and maintain a Jetty container. I don't think your single platform for your apps is relevant. You can turn off the UI, but as Reynold said, the HTTP servers are also part of the core data transport

Have Friedman's glmnet algo running in Spark

2015-02-19 Thread mike
Dev List, A couple of colleagues and I have gotten several versions of glmnet algo coded and running on Spark RDD. glmnet algo (http://www.jstatsoft.org/v33/i01/paper) is a very fast algorithm for generating coefficient paths solving penalized regression with elastic net penalties. The

Spark SQL, Hive Parquet data types

2015-02-19 Thread The Watcher
Still trying to get my head around Spark SQL Hive. 1) Let's assume I *only* use Spark SQL to create and insert data into HIVE tables, declared in a Hive meta-store. Does it matter at all if Hive supports the data types I need with Parquet, or is all that matters what Catalyst spark's parquet

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

2015-02-19 Thread Timothy Chen
+1 (non-binding) Tested Mesos coarse/fine-grained mode with 4 nodes Mesos cluster with simple shuffle/map task. Will be testing with more complete suite (ie: spark-perf) once the infrastructure is setup to do so. Tim On Thu, Feb 19, 2015 at 12:50 PM, Krishna Sankar ksanka...@gmail.com wrote:

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

2015-02-19 Thread Corey Nolet
+1 (non-binding) - Verified signatures using [1] - Built on MacOSX Yosemite - Built on Fedora 21 Each build was run with and Hadoop-2.4 version with yarn, hive, and hive-thriftserver profiles I am having trouble getting all the tests passing on a single run on both machines but we have this