Hi,
I have a spark job output DataFrame which contains a column named Id, which
is a GUID string.
We will use Id to filter data in another spark application, so it should be
a partition key.
I found these two methods in Internet:
1.
DataFrame.write.save("Id") method will help, but the possible
...@gmail.com>:
> Is there any chance that " spark.sql.hive.convertMetastoreParquet" is
> turned off?
>
> Cheng
>
> On 11/4/15 5:15 PM, Rex Xiong wrote:
>
> Thanks Cheng Lian.
> I found in 1.5, if I use spark to create this table with partition
> discovery
31日 下午7:38,"Rex Xiong" <bycha...@gmail.com>写道:
> Add back this thread to email list, forgot to reply all.
> 2015年10月31日 下午7:23,"Michael Armbrust" <mich...@databricks.com>写道:
>
>> Not that I know of.
>>
>> On Sat, Oct 31, 2015 at 12:22 PM
Add back this thread to email list, forgot to reply all.
2015年10月31日 下午7:23,"Michael Armbrust" <mich...@databricks.com>写道:
> Not that I know of.
>
> On Sat, Oct 31, 2015 at 12:22 PM, Rex Xiong <bycha...@gmail.com> wrote:
>
>> Good to know that, wil
Hi folks,
I have a Hive external table with partitions.
Every day, an App will generate a new partition day=-MM-dd stored by
parquet and run add-partition Hive command.
In some cases, we will add additional column to new partitions and update
Hive table schema, then a query across new and old
Hi folks,
In my spark application, executor task depends on snakeyaml-1.10.jar
I build it with Maven and it works fine:
spark-submit --master local --jars d:\snakeyaml-1.10.jar ...
But when I try to run it in yarn, I have issue, it seems spark executor
cannot find the jar file:
I resolve this issue finally by adding --conf spark.executor.extraClassPath=
snakeyaml-1.10.jar
2015-10-16 22:57 GMT+08:00 Rex Xiong <bycha...@gmail.com>:
> Hi folks,
>
> In my spark application, executor task depends on snakeyaml-1.10.jar
> I build it with Maven and it w
I use "spark-submit -master yarn-cluster hdfs://.../a.jar .." to submit
my app to yarn.
Then I update this a.jar in HDFS, run the command again, I found a line of
log that was been removed still exist in "yarn logs ".
Is there a cache mechanism I need to disable?
Thanks
In Yarn client mode, Spark driver URL will be redirected to Yarn web proxy
server, but I don't want to use this dynamic name, is it possible to still
use host:port as standalone mode?
Hi,
I try to use for one table created in spark, but it seems the results are
all empty, I want to get metadata for table, what's other options?
Thanks
+---+
|result |
+---+
| # col_name|
|
Hi,
We have a 3-node master setup with ZooKeeper HA.
Driver can find the master with spark://xxx:xxx,xxx:xxx,xxx:xxx
But how can I find out the valid Master UI without looping through all 3
nodes?
Thanks
On Tue, Apr 21, 2015 at 1:13 AM, Rex Xiong bycha...@gmail.com wrote:
We have the similar issue with massive parquet files, Cheng Lian, could
you have a look?
2015-04-08 15:47 GMT+08:00 Zheng, Xudong dong...@gmail.com:
Hi Cheng,
I tried both these patches, and seems still not resolve my
We have the similar issue with massive parquet files, Cheng Lian, could you
have a look?
2015-04-08 15:47 GMT+08:00 Zheng, Xudong dong...@gmail.com:
Hi Cheng,
I tried both these patches, and seems still not resolve my issue. And I
found the most time is spend on this line in
Hi Spark Users,
I'm testing 1.3 new feature of parquet partition discovery.
I have 2 sub folders, each has 800 rows.
/data/table1/key=1
/data/table1/key=2
In spark-shell, run this command:
val t = sqlContext.createExternalTable(table1, hdfs:///data/table1,
parquet)
t.count
It shows 1600
Hi,
I got this error when creating a hive table from parquet file:
DDLTask: org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.UnsupportedOperationException: Parquet does not support
timestamp. See HIVE-6384
I check HIVE-6384, it's fixed in 0.14.
The hive in spark build is a customized
Hi there,
I have an app talking to Spark Hive Server using Hive ODBC, querying is OK.
But in this interface, I can't get much running details when my query goes
wrong, only one error message is shown.
I want to get jobid for my query, so that I can go to Application Detail UI
to see what's going
16 matches
Mail list logo