+ dev mailing list
If this is supposed to work, is there a regression then?
The spark core code shows the permission for copied file to \work is set to a+x
at Line 442 of
Hi Sean,
The issue we have here is that all our products are based on a single
platform and we try to make all our products coherent with our platform as
much as possible. so, having two web services in one instance would not be
a very elegant solution. That is why we were seeking a way to switch
P.S: For some reason replacing import sqlContext.createSchemaRDD with
import sqlContext.implicits._ doesn't do the implicit conversations.
registerTempTable
gives syntax error. I will dig deeper tomorrow. Has anyone seen this ?
We will write up a whole migration guide before the final
1) is SKEWED BY honored ? If so, has anyone run into directories not being
created ?
It is not.
2) if it is not honored, does it matter ? Hive introduced this feature to
better handle joins where tables had a skewed distribution on keys joined
on so that the single mapper handling one of
Excellent. Explicit toDF() works.
a) employees.toDF().registerTempTable(Employees) - works
b) Also affects saveAsParquetFile - orders.toDF().saveAsParquetFile
Adding to my earlier tests:
4.0 SQL from Scala and Python
4.1 result = sqlContext.sql(SELECT * from Employees WHERE State = 'WA') OK
4.2
To add to Sean and Reynold's point:
Please correct me if I'm wrong, but Spark depends on hadoop-common which
also uses jetty in the HttpServer2 code. So even if you remove jetty
from Spark by making it an optional dependency, it will be pulled in by
Hadoop.
So you'll still see that your
I have done some testing of inserting into tables defined in Hive using 1.2
and I can see that the PARTITION clause is honored : data files get created
in multiple subdirectories correctly.
I tried the SKEWED BY ON STORED AS DIRECTORIES clause on the CREATE TABLE
clause but I didn't see
Sure, but you are not using Netty at all. It's invisible to you. It's
not as if you have to set up and maintain a Jetty container. I don't
think your single platform for your apps is relevant.
You can turn off the UI, but as Reynold said, the HTTP servers are
also part of the core data transport
Dev List,
A couple of colleagues and I have gotten several versions of glmnet algo coded
and running on Spark RDD. glmnet algo (http://www.jstatsoft.org/v33/i01/paper)
is a very fast algorithm for generating coefficient paths solving penalized
regression with elastic net penalties. The
Still trying to get my head around Spark SQL Hive.
1) Let's assume I *only* use Spark SQL to create and insert data into HIVE
tables, declared in a Hive meta-store.
Does it matter at all if Hive supports the data types I need with Parquet,
or is all that matters what Catalyst spark's parquet
+1 (non-binding)
Tested Mesos coarse/fine-grained mode with 4 nodes Mesos cluster with
simple shuffle/map task.
Will be testing with more complete suite (ie: spark-perf) once the
infrastructure is setup to do so.
Tim
On Thu, Feb 19, 2015 at 12:50 PM, Krishna Sankar ksanka...@gmail.com wrote:
+1 (non-binding)
- Verified signatures using [1]
- Built on MacOSX Yosemite
- Built on Fedora 21
Each build was run with and Hadoop-2.4 version with yarn, hive, and
hive-thriftserver profiles
I am having trouble getting all the tests passing on a single run on both
machines but we have this
12 matches
Mail list logo