What’s the command line you used to build Spark? Notice that you need to
add |-Phive-thriftserver| to build the JDBC Thrift server. This profile
was once removed in in v1.1.0, but added back in v1.2.0 because of
dependency issue introduced by Scala 2.11 support.
On 11/27/14 12:53 AM,
What version are you trying to build? I was at first assuming you're
using the most recent master, but from your first mail it seems that you
were trying to build Spark v1.1.0?
On 11/27/14 12:57 PM, vdiwakar.malladi wrote:
Thanks for your response.
I'm using the following command.
mvn
Hm, then the command line you used should be fine. Actually just tried
it locally and it’s fine. Make sure to run it in the root directory of
Spark source tree (don’t |cd| into assembly).
On 11/27/14 1:35 PM, vdiwakar.malladi wrote:
Yes, I'm building it from Spark 1.1.0
Thanks in advance.
I see. As what the exception stated, Maven can’t find |unzip| to help
building PySpark. So you need a Windows version of |unzip| (probably
from MinGW or Cygwin?)
On 11/27/14 2:10 PM, vdiwakar.malladi wrote:
Thanks for your prompt responses.
I'm generating assembly jar file from windows 7
Hey Venkat,
This behavior seems reasonable. According to the table name, I guess
here |DAgents| should be the fact table and |ContactDetails| is the dim
table. Below is an explanation of a similar query, you may see |src| as
|DAgents| and |src1| as |ContactDetails|.
|0:
You may do this:
|table(users).groupBy('zip)('zip, count('user), countDistinct('user))
|
On 12/4/14 8:47 AM, Arun Luthra wrote:
I'm wondering how to do this kind of SQL query with PairRDDFunctions.
SELECT zip, COUNT(user), COUNT(DISTINCT user)
FROM users
GROUP BY zip
In the Spark scala API,
Window functions are not supported yet, but there is a PR for it:
https://github.com/apache/spark/pull/2953
On 12/5/14 12:22 PM, Dai, Kevin wrote:
Hi, ALL
How can I group by one column and order by another one, then select
the first row for each group (which is just like window function
++ seen1)
}).mapValues {case (count, seen) =
(count, seen.size)
}
|
On 12/5/14 3:47 AM, Arun Luthra wrote:
Is that Spark SQL? I'm wondering if it's possible without spark SQL.
On Wed, Dec 3, 2014 at 8:08 PM, Cheng Lian lian.cs@gmail.com
mailto:lian.cs@gmail.com wrote:
You may do
You may access it via something like |SELECT filterIp.element FROM tb|,
just like Hive. Or if you’re using Spark SQL DSL, you can use
|tb.select(filterIp.element.attr)|.
On 12/8/14 1:08 PM, Xuelin Cao wrote:
Hi,
I'm generating a Spark SQL table from an offline Json file.
The
Essentially, the Spark SQL JDBC Thrift server is just a Spark port of
HiveServer2. You don't need to run Hive, but you do need a working
Metastore.
On 12/9/14 3:59 PM, Anas Mosaad wrote:
Thanks Judy, this is exactly what I'm looking for. However, and plz
forgive me if it's a dump question is:
(0.106 seconds)/
/0: jdbc:hive2://localhost:1 /
Kindly advice, what am I missing? I want to read the RDD using SQL
from outside spark-shell (i.e. like any other relational database)
On Tue, Dec 9, 2014 at 11:05 AM, Cheng Lian lian.cs@gmail.com
mailto:lian.cs@gmail.com wrote
)
On Tue, Dec 9, 2014 at 11:44 AM, Cheng Lian lian.cs@gmail.com
mailto:lian.cs@gmail.com wrote:
How did you register the table under spark-shell? Two things to
notice:
1. To interact with Hive, HiveContext instead of SQLContext must
be used.
2. `registerTempTable` doesn't
There are several overloaded versions of both |jsonFile| and |jsonRDD|.
Schema inferring is kinda expensive since it requires an extra Spark
job. You can avoid schema inferring by storing the inferred schema and
then use it together with the following two methods:
* |def jsonFile(path:
There isn’t a SQL statement that directly maps |SQLContext.isCached|,
but you can use |EXPLAIN EXTENDED| to check whether the underlying
physical plan is a |InMemoryColumnarTableScan|.
On 12/13/14 7:14 AM, Judy Nash wrote:
Hello,
Few questions on Spark SQL:
1)Does Spark SQL support
It seems that the Thrift server you connected to is the original
HiveServer2 rather than Spark SQL HiveThriftServer2.
On 12/19/14 4:08 PM, jeanlyn92 wrote:
when i run the *cache table as *in the beeline which communicate
with the thrift server i got the follow error:
14/12/19 15:57:05 ERROR
secs.
Hari
On Wed, Dec 17, 2014 at 10:09 PM, Cheng Lian lian.cs@gmail.com
mailto:lian.cs@gmail.com wrote:
What kinds are the tables underlying the SchemaRDDs? Could you
please provide the DDL of the tables and the query you executed?
On 12/18/14 6:15 AM
Evert - Thanks for the instructions, this is generally useful in other
scenarios, but I think this isn’t what Shahab needs, because
|saveAsTable| actually saves the contents of the SchemaRDD into Hive.
Shahab - As Michael has answered in another thread, you may try
On 12/17/14 1:43 PM, Jerry Raj wrote:
Hi,
I'm using the Scala DSL for Spark SQL, but I'm not able to do joins. I
have two tables (backed by Parquet files) and I need to do a join
across them using a common field (user_id). This works fine using
standard SQL but not using the
Could you please file a JIRA together with the Git commit you're using?
Thanks!
On 12/18/14 2:32 AM, Hao Ren wrote:
Hi,
When running SparkSQL branch 1.2.1 on EC2 standalone cluster, the following
query does not work:
create table debug as
select v1.*
from t1 as v1 left join t2 as v2
on
Hi Schweichler,
This is an interesting and practical question. I'm not familiar with how
Tableau works, but would like to share some thoughts.
In general, big data analytics frameworks like MR and Spark tend to
perform immutable functional transformations over immutable data. Whilst
in your
Hi Ji,
Spark SQL 1.2 only works with either Hive 0.12.0 or 0.13.1 due to Hive
API/protocol compatibility issues. When interacting with Hive 0.11.x,
connections and simple queries may succeed, but things may go crazy in
unexpected corners (like UDF).
Cheng
On 12/22/14 4:15 PM, Ji ZHANG
This depends on which output format you want. For Parquet, you can
simply do this:
|hiveContext.table(some_db.some_table).saveAsParquetFile(hdfs://path/to/file)
|
On 12/23/14 5:22 PM, LinQili wrote:
Hi Leo:
Thanks for your reply.
I am talking about using hive from spark to export data from
Here is a more cleaned up version, can be used in |./sbt/sbt
hive/console| to easily reproduce this issue:
|sql(SELECT * FROM src WHERE key % 2 = 0).
sample(withReplacement =false, fraction =0.05).
registerTempTable(sampled)
println(table(sampled).queryExecution)
val query = sql(SELECT
Could you please provide a complete stacktrace? Also it would be good if
you can share your hive-site.xml as well.
On 12/23/14 4:42 PM, Dai, Kevin wrote:
Hi, there
When I use hive udf from_unixtime with the HiveContext, the job block
and the log is as follow:
sun.misc.Unsafe.park(Native
Generally you can use |-Dsun.io.serialization.extendedDebugInfo=true| to
enable serialization debugging information when serialization exceptions
are raised.
On 12/24/14 1:32 PM, bigdata4u wrote:
I am trying to use sql over Spark streaming using Java. But i am getting
Serialization
Hao and Lam - I think the issue here is that |registerRDDAsTable| only
creates a temporary table, which is not seen by Hive metastore.
And Michael had once given a workaround for creating external Parquet
table:
Hi Roc,
Spark SQL 1.2.0 can only work with Hive 0.12.0 or Hive 0.13.1
(controlled by compilation flags), versions prior 1.2.0 only works with
Hive 0.12.0. So Hive 0.15.0-SNAPSHOT is not an option.
Would like to add that this is due to backwards compatibility issue of
Hive metastore, AFAIK
Most of the time a NoSuchMethodError means wrong classpath settings, and
some jar file is overriden by a wrong version. In your case it could be
netty.
On 1/3/15 1:36 PM, Niranda Perera wrote:
Hi all,
I am evaluating the spark sources API released with Spark 1.2.0. But
I'm getting a
Currently no if you don't want to use Spark SQL's HiveContext. But we're
working on adding partitioning support to the external data sources API,
with which you can create, for example, partitioned Parquet tables
without using Hive.
Cheng
On 1/26/15 8:47 AM, Danny Yates wrote:
Thanks
pseudo
distributed YARN cluster. Would you mind to
elaborate more about steps to reproduce this bug?
Thanks
On Sun, Aug 10, 2014 at 9:36 PM, Cheng Lian
lian.cs@gmail.com
Please note that Spark 1.2.0 /only/ support Hive 0.13.1 /or/ 0.12.0,
none of other versions are supported.
Best,
Cheng
On 1/25/15 12:18 AM, guxiaobo1982 wrote:
Hi,
I built and started a single node standalone Spark 1.2.0 cluster along
with a single node Hive 0.14.0 instance installed by
Hi Ayoub,
The doc page isn’t wrong, but it’s indeed confusing.
|spark.sql.parquet.compression.codec| is used when you’re wring Parquet
file with something like |data.saveAsParquetFile(...)|. However, you are
using Hive DDL in the example code. All Hive DDLs and commands like
|SET| are
|IF| is implemented as a generic UDF in Hive (|GenericUDFIf|). It seems
that this function can’t be properly resolved. Could you provide a
minimum code snippet that reproduces this issue?
Cheng
On 1/20/15 1:22 AM, Xuelin Cao wrote:
Hi,
I'm trying to migrate some hive scripts to
Guess this can be helpful:
http://stackoverflow.com/questions/14252615/stack-function-in-hive-how-to-specify-multiple-aliases
On 1/19/15 8:26 AM, mucks17 wrote:
Hello
I use Hive on Spark and have an issue with assigning several aliases to the
output (several return values) of an UDF. I ran
for future groupings (assuming we cache I suppose)
Mick
On 20 Jan 2015, at 20:44, Cheng Lian lian.cs@gmail.com
mailto:lian.cs@gmail.com wrote:
First of all, even if the underlying dataset is partitioned as
expected, a shuffle can’t be avoided. Because Spark SQL knows
:07 PM, Cheng Lian lian.cs@gmail.com wrote:
Hey Yana,
Sorry for the late reply, missed this important thread somehow. And many
thanks for reporting this. It turned out to be a bug — filter pushdown is
only enabled when using client side metadata, which is not expected,
because task side
I think you can resort to a Hive table partitioned by date
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-PartitionedTables
On 1/11/15 9:51 PM, Paul Wais wrote:
Dear List,
What are common approaches for addressing over a union of tables /
RDDs? E.g.
-means model from its cluster centers. -Xiangrui
On Tue, Jan 20, 2015 at 11:55 AM, Cheng Lian lian.cs@gmail.com wrote:
This is because KMeanModel is neither a built-in type nor a user defined
type recognized by Spark SQL. I think you can write your own UDT version of
KMeansModel in this case
Hey Surbhit,
In this case, the web UI stats is not accurate. Please refer to this
thread for an explanation:
https://www.mail-archive.com/user@spark.apache.org/msg18919.html
Cheng
On 1/13/15 1:46 AM, Surbhit wrote:
Hi,
I am using spark 1.1.0.
I am using the spark-sql shell to run all the
Hey Yana,
Sorry for the late reply, missed this important thread somehow. And many
thanks for reporting this. It turned out to be a bug — filter pushdown
is only enabled when using client side metadata, which is not expected,
because task side metadata code path is more performant. And I
You need to provide key type, value type for map type, element type for
array type, and whether they contain null:
|StructType(Array(
StructField(map_field,MapType(keyType =IntegerType, valueType =StringType,
containsNull =true), nullable =true),
. For example, |Sort| does defensive copy as it needs
to cache rows for sorting.
Keen to get the best performance and the best blend of SparkSQL and
functional Spark.
Cheers,
Nathan
From: Cheng Lian lian.cs@gmail.com mailto:lian.cs@gmail.com
Date: Monday, 12 January 2015 1:21 am
First of all, even if the underlying dataset is partitioned as expected,
a shuffle can’t be avoided. Because Spark SQL knows nothing about the
underlying data distribution. However, this does reduce network IO.
You can prepare your data like this (say |CustomerCode| is a string
field with
Hey Yi,
I'm quite unfamiliar with Hadoop/HDFS auth mechanisms for now, but would
like to investigate this issue later. Would you please open an JIRA for
it? Thanks!
Cheng
On 1/19/15 1:00 AM, Yi Tian wrote:
Is there any way to support multiple users executing SQL on one thrift
server?
I
I had once worked on a named row feature but haven’t got time to finish
it. It looks like this:
|sql(...).named.map { row:NamedRow =
row[Int]('key) - row[String]('value)
}
|
Basically the |named| method generates a field name to ordinal map for
each RDD partition. This map is then shared
|spark.sql.parquet.filterPushdown| defaults to |false| because there’s a
bug in Parquet which may cause NPE, please refer to
http://spark.apache.org/docs/latest/sql-programming-guide.html#configuration
This bug hasn’t been fixed in Parquet master. We’ll turn this on once
the bug is fixed.
In Spark SQL, Parquet filter pushdown doesn’t cover |HiveTableScan| for
now. May I ask why do you prefer |HiveTableScan| rather than
|ParquetTableScan|?
Cheng
On 1/19/15 5:02 PM, Xiaoyu Wang wrote:
The *spark.sql.parquet.**filterPushdown=true *has been turned on. But
set
This is because |KMeanModel| is neither a built-in type nor a user
defined type recognized by Spark SQL. I think you can write your own UDT
version of |KMeansModel| in this case. You may refer to
|o.a.s.mllib.linalg.Vector| and |o.a.s.mllib.linalg.VectorUDT| as an
example.
Cheng
On 1/20/15
On 1/27/15 5:55 PM, Cheng Lian wrote:
On 1/27/15 11:38 AM, Manoj Samel wrote:
Spark 1.2, no Hive, prefer not to use HiveContext to avoid metastore_db.
Use case is Spark Yarn app will start and serve as query server for
multiple users i.e. always up and running. At startup, there is
option
On 1/21/15 10:39 AM, Cheng Lian wrote:
Oh yes, thanks for adding that using |sc.hadoopConfiguration.set| also
works :-)
On Wed, Jan 21, 2015 at 7:11 AM, Yana Kadiyska
yana.kadiy...@gmail.com mailto:yana.kadiy...@gmail.com wrote:
Thanks for looking Cheng. Just to clarify in case other
According to the Gist Ayoub provided, the schema is fine. I reproduced
this issue locally, it should be bug, but I don't think it's related to
SPARK-5236. Will investigate this soon.
Ayoub - would you mind to help to file a JIRA for this issue? Thanks!
Cheng
On 1/30/15 11:28 AM, Michael
According to the Gist Ayoub provided, the schema is fine. I reproduced
this issue locally, it should be bug, but I don't think it's related to
SPARK-5236. Will investigate this soon.
Ayoub - would you mind to help to file a JIRA for this issue? Thanks!
Cheng
On 1/30/15 11:28 AM, Michael
Yeah, currently there isn't such a repo. However, the Spark team is
working on this.
Cheng
On 1/30/15 8:19 AM, Ayoub wrote:
I am not personally aware of a repo for snapshot builds.
In my use case, I had to build spark 1.2.1-snapshot
see
Hey Alexey,
You need to use |HiveContext| in order to access Hive UDFs. You may try
it with |bin/spark-sql| (|src| is a Hive table):
|spark-sql select key / 3 from src limit 10;
79.33
28.668
103.67
9.0
55.0
136.34
85.0
92.67
Hey Jorge,
This is expected. Because there isn’t an obvious mapping from |Set[T]|
to any SQL types. Currently we have complex types like array, map, and
struct, which are inherited from Hive. In your case, I’d transform the
|Set[T]| into a |Seq[T]| first, then Spark SQL can map it to an
operator may also cache row objects. This is very
implementation specific and may change between versions.
Cheers,
~N
From: Michael Armbrust mich...@databricks.com
mailto:mich...@databricks.com
Date: Saturday, 10 January 2015 3:41 am
To: Cheng Lian lian.cs@gmail.com mailto:lian.cs
Hey Nathan,
Thanks for sharing, this is a very interesting post :) My comments are
inlined below.
Cheng
On 1/7/15 11:53 AM, Nathan McCarthy wrote:
Hi,
I’m trying to use a combination of SparkSQL and ‘normal' Spark/Scala
via rdd.mapPartitions(…). Using the latest release 1.2.0.
Simple
Hi Manoj,
Yes, you've already hit the point. I think timestamp type support in the
in-memory columnar support can be a good reference for you. Also, you may
want to enable compression support for decimal type by adding DECIMAL
column type to RunLengthEncoding.supports and
Hi Jianshi,
When accessing a Hive table with Parquet SerDe, Spark SQL tries to convert
it into Spark SQL's native Parquet support for better performance. And yes,
predicate push-down, column pruning are applied here. In 1.3.0, we'll also
cover the write path except for writing partitioned table.
Hey Xuelin, which data item in the Web UI did you check?
On 1/7/15 5:37 PM, Xuelin Cao wrote:
Hi,
Curious and curious. I'm puzzled by the Spark SQL cached table.
Theoretically, the cached table should be columnar table, and only
scan the column that included in my SQL.
However, in my
the input data for each task (in the stage
detail page). And the sum of the input data for each task is also 1212.5MB
On Thu, Jan 8, 2015 at 6:40 PM, Cheng Lian lian.cs@gmail.com
mailto:lian.cs@gmail.com wrote:
Hey Xuelin, which data item in the Web UI did you check?
On 1/7/15 5
Spark SQL supports Hive insertion statement (Hive 0.14.0 style insertion
is not supported though)
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-InsertingdataintoHiveTablesfromqueries
The small SQL dialect provided in Spark SQL doesn't support insertion
This package is moved here: https://github.com/databricks/spark-avro
On 1/6/15 5:12 AM, yanenli2 wrote:
Hi All,
I want to use the SparkSQL to manipulate the data with Avro format. I
found a solution at https://github.com/marmbrus/sql-avro . However it
doesn't compile successfully anymore with
The |+| operator only handles numeric data types, you may register you
own concat function like this:
|sqlContext.registerFunction(concat, (s: String, t: String) = s + t)
sqlContext.sql(select concat('$', col1) from tbl)
|
Cheng
On 1/5/15 1:13 PM, RK wrote:
The issue is happening when I try
drops.
If you try like this:
cacheTable(tbl)
sql(select * from tbl).collect() sql(select name from tbl).collect()
sql(select * from tbl).collect()
Is the input data of the 3rd SQL bigger than 49.1KB?
On Thu, Jan 8, 2015 at 9:36 PM, Cheng Lian lian.cs@gmail.com
mailto:lian.cs
Would you mind to provide the query? If it's confidential, could you
please help constructing a query that reproduces this issue?
Cheng
On 3/18/15 6:03 PM, Roberto Coluccio wrote:
Hi everybody,
When trying to upgrade from Spark 1.1.1 to Spark 1.2.x (tried both
1.2.0 and 1.2.1) I encounter a
You should probably increase executor memory by setting
spark.executor.memory.
Full list of available configurations can be found here
http://spark.apache.org/docs/latest/configuration.html
Cheng
On 3/18/15 9:15 PM, Yiannis Gkoufas wrote:
Hi there,
I was trying the new DataFrame API with
with only (and less than 22) String fields.
Hope the situation is a bit more clear. Thanks anyone who will help me
out here.
Roberto
On Wed, Mar 18, 2015 at 12:09 PM, Cheng Lian lian.cs@gmail.com
mailto:lian.cs@gmail.com wrote:
Would you mind to provide the query? If it's
Yes
On 3/18/15 8:20 PM, sequoiadb wrote:
hey guys,
In my understanding SparkSQL only supports JDBC connection through hive thrift
server, is this correct?
Thanks
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
Currently there’s no convenient way to convert a
|SchemaRDD|/|JavaSchemaRDD| back to an |RDD|/|JavaRDD| of some case
class. But you can convert an |RDD|/|JavaRDD| into an
|RDD[Row]|/|JavaRDDRow| using |schemaRdd.rdd| and |new
JavaRDDRow(schemaRdd.rdd)|.
Cheng
On 3/15/15 10:22 PM, Renato
Not quite sure whether I understand your question properly. But if you
just want to read the partition columns, it’s pretty easy. Take the
“year” column as an example, you may do this in HiveQL:
|hiveContext.sql(SELECT year FROM speed)
|
or in DataFrame DSL:
Hi Judy,
In the case of |HadoopRDD| and |NewHadoopRDD|, partition number is
actually decided by the |InputFormat| used. And
|spark.sql.inMemoryColumnarStorage.batchSize| is not related to
partition number, it controls the in-memory columnar batch size within a
single partition.
Also, what
Spark SQL supports most commonly used features of HiveQL. However,
different HiveQL statements are executed in different manners:
1.
DDL statements (e.g. |CREATE TABLE|, |DROP TABLE|, etc.) and
commands (e.g. |SET key = value|, |ADD FILE|, |ADD JAR|, etc.)
In most cases, Spark SQL
That's an unfortunate documentation bug in the programming guide... We
failed to update it after making the change.
Cheng
On 2/28/15 8:13 AM, Deborah Siegel wrote:
Hi Michael,
Would you help me understand the apparent difference here..
The Spark 1.2.1 programming guide indicates:
Note
This article by Ryan Blue should be helpful to understand the problem
http://ingest.tips/2015/01/31/parquet-row-group-size/
The TL;DR is, you may decrease |parquet.block.size| to reduce memory
consumption. Anyway, 100K columns is a really big burden for Parquet,
but I guess your data should
The parquet-tools code should be pretty helpful (although it's Java)
https://github.com/apache/incubator-parquet-mr/tree/master/parquet-tools/src/main/java/parquet/tools/command
On 3/10/15 12:25 AM, Shuai Zheng wrote:
Hi All,
I have a lot of parquet files, and I try to open them directly
Hey Yong,
It seems that Hadoop `FileSystem` adds the size of a block to the
metrics even if you only touch a fraction of it (reading Parquet
metadata for example). This behavior can be verified by the following
snippet:
```scala
import org.apache.spark.sql.Row
import
It should be OK. If you encountered problems in having a long opened
connection to the Thrift server, it should be a bug.
Cheng
On 3/9/15 6:41 PM, fanooos wrote:
I have some applications developed using PHP and currently we have a problem
in connecting these applications to spark sql thrift
Hey Masf,
I’ve created SPARK-6360
https://issues.apache.org/jira/browse/SPARK-6360 to track this issue.
Detailed analysis is provided there. The TL;DR is, for Spark 1.1 and
1.2, if a SchemaRDD contains decimal or UDT column(s), after applying
any traditional RDD transformations (e.g.
, and the value of the partition
column to be inserted must be from temporary registered table/dataframe.
Patcharee
On 16. mars 2015 15:26, Cheng Lian wrote:
Not quite sure whether I understand your question properly. But if
you just want to read the partition columns, it’s pretty easy. Take
the “year
I don't see non-serializable objects in the provided snippets. But you
can always add -Dsun.io.serialization.extendedDebugInfo=true to Java
options to debug serialization errors.
Cheng
On 3/17/15 12:43 PM, anu wrote:
Spark Version - 1.1.0
Scala - 2.10.4
I have loaded following type data
Hey Yang,
My comments are in-lined below.
Cheng
On 3/18/15 6:53 AM, Yang Lei wrote:
Hello,
I am migrating my Spark SQL external datasource integration from Spark
1.2.x to Spark 1.3.
I noticed, there are a couple of new filters now, e.g.
org.apache.spark.sql.sources.And. However, for a
This has been fixed by https://github.com/apache/spark/pull/5020
On 3/18/15 12:24 AM, Franz Graf wrote:
Hi all,
today we tested Spark 1.3.0.
Everything went pretty fine except that I seem to be unable to save an
RDD as parquet to HDFS.
A minimum example is:
import sqlContext.implicits._
//
$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2015-03-25 19:05 GMT+08:00 Cheng Lian lian.cs@gmail.com
mailto:lian.cs@gmail.com:
Could you please provide the full stack trace?
On 3/25/15 6:26 PM, 李铖 wrote:
It is ok when I do query data from
You may resort to the generic save API introduced in 1.3, which supports
appending as long as the target data source supports it. And in 1.3,
Parquet does support appending.
Cheng
On 3/26/15 4:13 PM, Richard Grossman wrote:
Hi
I've succeed to write kafka stream to parquet file in Spark 1.2
I couldn’t reproduce this with the following spark-shell snippet:
|scala import sqlContext.implicits._
scala Seq((1, 2)).toDF(a, b)
scala res0.save(xxx, org.apache.spark.sql.SaveMode.Overwrite)
scala res0.save(xxx, org.apache.spark.sql.SaveMode.Overwrite)
|
The _common_metadata file is
We're working together with AsiaInfo on this. Possibly will deliver an
initial version of window function support in 1.4.0. But it's not a
promise yet.
Cheng
On 3/26/15 7:27 PM, Arush Kharbanda wrote:
Its not yet implemented.
https://issues.apache.org/jira/browse/SPARK-1442
On Thu, Mar 26,
as HiveContext
constructor does not accept JaveSparkContext and JaveSparkContext
is not subclass of SparkContext.
Anyone else have any idea? I suspect this is supported now.
On Sun, Mar 29, 2015 at 8:54 AM, Cheng Lian
lian.cs@gmail.com mailto:lian.cs@gmail.com wrote:
You may
The mysql command line doesn't use JDBC to talk to MySQL server, so
this doesn't verify anything.
I think this Hive metastore installation guide from Cloudera may be
helpful. Although this document is for CDH4, the general steps are the
same, and should help you to figure out the
Ah, sorry, my bad...
http://www.cloudera.com/content/cloudera/en/documentation/cdh4/v4-2-0/CDH4-Installation-Guide/cdh4ig_topic_18_4.html
On 3/30/15 10:24 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote:
Hello Lian
Can you share the URL ?
On Mon, Mar 30, 2015 at 6:12 PM, Cheng Lian lian.cs@gmail.com
(DAGScheduler.scala:1354)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
Regards,
Deepak
On Fri, Mar 27, 2015 at 8:33 AM, Cheng Lian lian.cs@gmail.com
mailto:lian.cs@gmail.com wrote:
As the exception suggests, you don't have MySQL JDBC driver on
your classpath
As the exception suggests, you don't have MySQL JDBC driver on your
classpath.
On 3/27/15 10:45 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote:
I am unable to run spark-sql form command line. I attempted the following
1)
export SPARK_HOME=/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4
export
This should be a bug in the Explode.eval(), which always assumes the
underlying SQL array is represented by a Scala Seq. Would you mind to
open a JIRA ticket for this? Thanks!
Cheng
On 3/27/15 7:00 PM, Jon Chase wrote:
Spark 1.3.0
Two issues:
a) I'm unable to get a lateral view explode
, 2015 at 7:26 PM, Cheng Lian
lian.cs@gmail.com mailto:lian.cs@gmail.com wrote:
I couldn’t reproduce this with the following spark-shell snippet:
|scala import sqlContext.implicits._
scala Seq((1, 2)).toDF(a, b)
scala res0.save(xxx
(), as it produced a
similar exception (though there was no use of explode there).
On Fri, Mar 27, 2015 at 7:20 AM, Cheng Lian lian.cs@gmail.com
mailto:lian.cs@gmail.com wrote:
This should be a bug in the Explode.eval(), which always assumes
the underlying SQL array is represented by a Scala
:14 AM, Cheng Lian lian.cs@gmail.com
mailto:lian.cs@gmail.com wrote:
Forgot to mention that, would you mind to also provide the full
stack trace of the exception thrown in the saveAsParquetFile call?
Thanks!
Cheng
On 3/27/15 7:35 PM, Jon Chase wrote:
https
You may simply pass in JavaSparkContext.sc
On 3/29/15 9:25 PM, Vincent He wrote:
All,
I try Spark SQL with Java, I find HiveContext does not accept
JavaSparkContext, is this true? Or any special build of Spark I need
to do (I build with Hive and thrift server)? Can we use HiveContext in
is not subclass
of SparkContext.
Anyone else have any idea? I suspect this is supported now.
On Sun, Mar 29, 2015 at 8:54 AM, Cheng Lian lian.cs@gmail.com
mailto:lian.cs@gmail.com wrote:
You may simply pass in JavaSparkContext.sc
On 3/29/15 9:25 PM, Vincent He wrote:
All
You need either
|.map { row =
(row(0).asInstanceOf[Float], row(1).asInstanceOf[Float], ...)
}
|
or
|.map {case Row(f0:Float, f1:Float, ...) =
(f0, f1)
}
|
On 3/23/15 9:08 AM, Minnow Noir wrote:
I'm following some online tutorial written in Python and trying to
convert a Spark SQL table
(Move to user list.)
Hi Kannan,
You need to set |mapred.map.tasks| to 1 in hive-site.xml. The reason is
this line of code
https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala#L68,
which overrides |spark.default.parallelism|. Also,
101 - 200 of 364 matches
Mail list logo