in spark 1.1 maybe not so easy like spark 1.0 after commit:
https://issues.apache.org/jira/browse/SPARK-2446
only binary with UTF8 annotation will be recognized as string after this
commit, but in impala strings are always without UTF8 anno
--
View this message in context:
Hi,
I don't think anybody has been testing importing of Impala tables
directly. Is there any chance to export these first, say as
unpartitioned Hive tables and import these? Just an idea..
Andre
On 07/21/2014 11:46 PM, chutium wrote:
no, something like this
14/07/20 00:19:29 ERROR
I haven't had a chance to look at the details of this issue, but we have
seen Spark successfully read Parquet tables created by Impala.
On Tue, Jul 22, 2014 at 10:10 AM, Andre Schumacher andre.sc...@gmail.com
wrote:
Hi,
I don't think anybody has been testing importing of Impala tables
Instead of using union, can you try sqlContext.parquetFile(/user/
hive/warehouse/xxx_parquet.db).registerAsTable(parquetTable)?
Then, var all = sql(select some_id, some_type, some_time from
parquetTable).map(line
= (line(0), (line(1).toString, line(2).toString.substring(0, 19
Thanks,
Yin
Hi,
unfortunately it is not so straightforward
xxx_parquet.db
is a folder of managed database created by hive/impala, so, every sub
element in it is a table in hive/impala, they are folders in HDFS, and each
table has different schema, and in its folder there are one or more parquet
files.
What's the exception you're seeing? Is it an OOM?
On Mon, Jul 21, 2014 at 11:20 AM, chutium teng@gmail.com wrote:
Hi,
unfortunately it is not so straightforward
xxx_parquet.db
is a folder of managed database created by hive/impala, so, every sub
element in it is a table in
no, something like this
14/07/20 00:19:29 ERROR cluster.YarnClientClusterScheduler: Lost executor 2
on 02.xxx: remote Akka client disassociated
...
...
14/07/20 00:21:13 WARN scheduler.TaskSetManager: Lost TID 832 (task 1.2:186)
14/07/20 00:21:13 WARN scheduler.TaskSetManager: Loss was
like this:
val sc = new SparkContext(new SparkConf().setAppName(SLA Filter))
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext._
val suffix = args(0)
sqlContext.parquetFile(/user/hive/warehouse/xxx_parquet.db/xx001_
+
160G parquet files (ca. 30 files, snappy compressed, made by cloudera impala)
ca. 30 full table scan, took 3-5 columns out, then some normal scala
operations like substring, groupby, filter, at the end, save as file in HDFS
yarn-client mode, 23 core and 60G mem / node
but, always failed !
Can you attach your code?
Thanks,
Yin
On Sat, Jul 19, 2014 at 4:10 PM, chutium teng@gmail.com wrote:
160G parquet files (ca. 30 files, snappy compressed, made by cloudera
impala)
ca. 30 full table scan, took 3-5 columns out, then some normal scala
operations like substring, groupby,
10 matches
Mail list logo