SparkSQL Added file get Exception: is a directory and recursive is not turned on

2016-07-07 Thread linxi zeng
Hi, all: As recorded in https://issues.apache.org/jira/browse/SPARK-16408, when using Spark-sql to execute sql like: add file hdfs://xxx/user/test; If the HDFS path( hdfs://xxx/user/test) is a directory, then we will get an exception like: org.apache.spark.SparkException: Added file

run spark sql with script transformation faild

2016-06-27 Thread linxi zeng
Hi, all: Recently, we are trying to compare with spark sql and hive on MR, and I have tried to run spark (spark1.6 rc2) sql with script transformation, the spark job faild and get an error message like: 16/06/26 11:01:28 INFO codegen.GenerateUnsafeProjection: Code generated in 19.054534 ms

why did spark2.0 Disallow ROW FORMAT and STORED AS (parquet | orc | avro etc.)

2016-06-22 Thread linxi zeng
Hi All, I have tried the spark sql of Spark branch-2.0 and countered an unexpected problem: Operation not allowed: ROW FORMAT DELIMITED is only compatible with 'textfile', not 'orc'(line 1, pos 0) the sql is like: CREATE TABLE IF NOT EXISTS test.test_orc ( ... ) PARTITIONED BY (xxx) ROW

spark sql write orc table on viewFS throws exception

2016-05-14 Thread linxi zeng
hi, all: Recently, we have encountered a problem while using spark sql to write orc table, which is related to https://issues.apache.org/jira/browse/HIVE-10790. In order to fix this problem we decided to patched the PR to the hive branch which spark1.5 rely on. We pull the hive branch(

Re: spark sql job create too many files in HDFS when doing insert overwrite hive table

2016-04-28 Thread linxi zeng
BTW, I have created a JIRA task to follow this issue: https://issues.apache.org/jira/browse/SPARK-14974 2016-04-28 18:08 GMT+08:00 linxi zeng <linxizeng0...@gmail.com>: > Hi, > > Recently, we often encounter problems using spark sql for inserting data > into a partition

spark sql job create too many files in HDFS when doing insert overwrite hive table

2016-04-28 Thread linxi zeng
Hi, Recently, we often encounter problems using spark sql for inserting data into a partition table (ex.: insert overwrite table $output_table partition(dt) select xxx from tmp_table). After the spark job start running on yarn, *the app will create too many files (ex. 200w+, or even 1000w+),