Range partition for parquet file?

2016-05-27 Thread Rex Xiong
Hi, I have a spark job output DataFrame which contains a column named Id, which is a GUID string. We will use Id to filter data in another spark application, so it should be a partition key. I found these two methods in Internet: 1. DataFrame.write.save("Id") method will help, but the possible

Re: Issue of Hive parquet partitioned table schema mismatch

2015-11-06 Thread Rex Xiong
...@gmail.com>: > Is there any chance that " spark.sql.hive.convertMetastoreParquet" is > turned off? > > Cheng > > On 11/4/15 5:15 PM, Rex Xiong wrote: > > Thanks Cheng Lian. > I found in 1.5, if I use spark to create this table with partition > discovery

Re: Issue of Hive parquet partitioned table schema mismatch

2015-11-03 Thread Rex Xiong
31日 下午7:38,"Rex Xiong" <bycha...@gmail.com>写道: > Add back this thread to email list, forgot to reply all. > 2015年10月31日 下午7:23,"Michael Armbrust" <mich...@databricks.com>写道: > >> Not that I know of. >> >> On Sat, Oct 31, 2015 at 12:22 PM

Re: Issue of Hive parquet partitioned table schema mismatch

2015-10-31 Thread Rex Xiong
Add back this thread to email list, forgot to reply all. 2015年10月31日 下午7:23,"Michael Armbrust" <mich...@databricks.com>写道: > Not that I know of. > > On Sat, Oct 31, 2015 at 12:22 PM, Rex Xiong <bycha...@gmail.com> wrote: > >> Good to know that, wil

Issue of Hive parquet partitioned table schema mismatch

2015-10-30 Thread Rex Xiong
Hi folks, I have a Hive external table with partitions. Every day, an App will generate a new partition day=-MM-dd stored by parquet and run add-partition Hive command. In some cases, we will add additional column to new partitions and update Hive table schema, then a query across new and old

Issue of jar dependency in yarn-cluster mode

2015-10-16 Thread Rex Xiong
Hi folks, In my spark application, executor task depends on snakeyaml-1.10.jar I build it with Maven and it works fine: spark-submit --master local --jars d:\snakeyaml-1.10.jar ... But when I try to run it in yarn, I have issue, it seems spark executor cannot find the jar file:

Re: Issue of jar dependency in yarn-cluster mode

2015-10-16 Thread Rex Xiong
I resolve this issue finally by adding --conf spark.executor.extraClassPath= snakeyaml-1.10.jar 2015-10-16 22:57 GMT+08:00 Rex Xiong <bycha...@gmail.com>: > Hi folks, > > In my spark application, executor task depends on snakeyaml-1.10.jar > I build it with Maven and it w

Jar is cached in yarn-cluster mode?

2015-10-09 Thread Rex Xiong
I use "spark-submit -master yarn-cluster hdfs://.../a.jar .." to submit my app to yarn. Then I update this a.jar in HDFS, run the command again, I found a line of log that was been removed still exist in "yarn logs ". Is there a cache mechanism I need to disable? Thanks

Is it possible to disable AM page proxy in Yarn client mode?

2015-08-03 Thread Rex Xiong
In Yarn client mode, Spark driver URL will be redirected to Yarn web proxy server, but I don't want to use this dynamic name, is it possible to still use host:port as standalone mode?

DESCRIBE FORMATTED doesn't work in Hive Thrift Server?

2015-07-05 Thread Rex Xiong
Hi, I try to use for one table created in spark, but it seems the results are all empty, I want to get metadata for table, what's other options? Thanks +---+ |result | +---+ | # col_name| |

How to get Master UI with ZooKeeper HA setup?

2015-05-11 Thread Rex Xiong
Hi, We have a 3-node master setup with ZooKeeper HA. Driver can find the master with spark://xxx:xxx,xxx:xxx,xxx:xxx But how can I find out the valid Master UI without looping through all 3 nodes? Thanks

Re: Parquet Hive table become very slow on 1.3?

2015-04-22 Thread Rex Xiong
On Tue, Apr 21, 2015 at 1:13 AM, Rex Xiong bycha...@gmail.com wrote: We have the similar issue with massive parquet files, Cheng Lian, could you have a look? 2015-04-08 15:47 GMT+08:00 Zheng, Xudong dong...@gmail.com: Hi Cheng, I tried both these patches, and seems still not resolve my

Re: Parquet Hive table become very slow on 1.3?

2015-04-21 Thread Rex Xiong
We have the similar issue with massive parquet files, Cheng Lian, could you have a look? 2015-04-08 15:47 GMT+08:00 Zheng, Xudong dong...@gmail.com: Hi Cheng, I tried both these patches, and seems still not resolve my issue. And I found the most time is spend on this line in

Issue of sqlContext.createExternalTable with parquet partition discovery after changing folder structure

2015-04-04 Thread Rex Xiong
Hi Spark Users, I'm testing 1.3 new feature of parquet partition discovery. I have 2 sub folders, each has 800 rows. /data/table1/key=1 /data/table1/key=2 In spark-shell, run this command: val t = sqlContext.createExternalTable(table1, hdfs:///data/table1, parquet) t.count It shows 1600

Parquet timestamp support for Hive?

2015-04-03 Thread Rex Xiong
Hi, I got this error when creating a hive table from parquet file: DDLTask: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.UnsupportedOperationException: Parquet does not support timestamp. See HIVE-6384 I check HIVE-6384, it's fixed in 0.14. The hive in spark build is a customized

Return jobid for a hive query?

2015-03-03 Thread Rex Xiong
Hi there, I have an app talking to Spark Hive Server using Hive ODBC, querying is OK. But in this interface, I can't get much running details when my query goes wrong, only one error message is shown. I want to get jobid for my query, so that I can go to Application Detail UI to see what's going