hadoop.ParquetOutputCommitter: could not write summary file

2016-03-29 Thread
an error occured when write parquet files to disk. any advise? I want to know the reason.thanks ``` 16/03/29 18:31:48 WARN hadoop.ParquetOutputCommitter: could not write summary file for file:/tmp/goods/2015-6 java.lang.NullPointerException at

How to read compressed parquet file

2015-09-09 Thread
I think too many parquet files may be affect reading capability,so I use hadoop archive to combine them,but sql_context.read.parquet(output_path) does not work on the file. How to fix it ,please help me. :)

Re: How to read compressed parquet file

2015-09-09 Thread
It works. at spark 1.4 Thanks a lot. 2015-09-09 17:21 GMT+08:00 Cheng Lian <lian.cs@gmail.com>: > You need to use "har://" instead of "hdfs://" to read HAR files. Just > tested against Spark 1.5, and it works as expected. > > Cheng > > > On

Differents in loading data using spark datasource api and using jdbc

2015-08-10 Thread
Hi,everyone. I have one question in loading data using spark datasource api and using jdbc that which way is effective?

Differents of loading data

2015-08-10 Thread
What is the differents of loading data using jdbc and loading data using spard data source api? or differents of loading data using mongo-hadoop and loading data using native java driver? Which way is better?

Re: How to increase the number of tasks

2015-06-05 Thread
Did you have a change of the value of 'spark.default.parallelism'?be a bigger number. 2015-06-05 17:56 GMT+08:00 Evo Eftimov evo.efti...@isecc.com: It may be that your system runs out of resources (ie 174 is the ceiling) due to the following 1. RDD Partition = (Spark) Task 2.

Re: How to increase the number of tasks

2015-06-05 Thread
just multiply 2-4 with the cpu core number of the node . 2015-06-05 18:04 GMT+08:00 ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com: I did not change spark.default.parallelism, What is recommended value for it. On Fri, Jun 5, 2015 at 3:31 PM, 李铖 lidali...@gmail.com wrote: Did you have a change

'Java heap space' error occured when query 4G data file from HDFS

2015-04-07 Thread
In my dev-test env .I have 3 virtual machines ,every machine have 12G memory,8 cpu core. Here is spark-defaults.conf,and spark-env.sh.Maybe some config is not right. I run this command :*spark-submit --master yarn-client --driver-memory 7g --executor-memory 6g /home/hadoop/spark/main.py*

Re: 'Java heap space' error occured when query 4G data file from HDFS

2015-04-07 Thread
Any help?please. Help me do a right configure. 李铖 lidali...@gmail.com于2015年4月7日星期二写道: In my dev-test env .I have 3 virtual machines ,every machine have 12G memory,8 cpu core. Here is spark-defaults.conf,and spark-env.sh.Maybe some config is not right. I run this command :*spark-submit

Missing an output location for shuffle. : (

2015-03-26 Thread
Again,when I do larger file Spark-sql query, error occured.Anyone have got fix it .Please help me. Here is the track. org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 0 at

Re: Missing an output location for shuffle. : (

2015-03-26 Thread
172.100.11.25080:71:7a:95:48:a21364021秒 2015-03-26 23:01 GMT+08:00 Michael Armbrust mich...@databricks.com: I would suggest looking for errors in the logs of your executors. On Thu, Mar 26, 2015 at 3:20 AM, 李铖 lidali...@gmail.com wrote: Again,when I do larger file Spark-sql query, error

Spark-sql query got exception.Help

2015-03-25 Thread
It is ok when I do query data from a small hdfs file. But if the hdfs file is 152m,I got this exception. I try this code .'sc.setSystemProperty(spark.kryoserializer.buffer.mb,'256')'.error still. ``` com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0, required: 39135 at

Re: Spark-sql query got exception.Help

2015-03-25 Thread
the 2nd one is larger (it seems that Kryo doesn’t check for it). Cheng On 3/25/15 7:31 PM, 李铖 wrote: Here is the full track 15/03/25 17:48:34 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, cloud1): com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0, required

Re: Spark-sql query got exception.Help

2015-03-25 Thread
the full stack trace? On 3/25/15 6:26 PM, 李铖 wrote: It is ok when I do query data from a small hdfs file. But if the hdfs file is 152m,I got this exception. I try this code .'sc.setSystemProperty(spark.kryoserializer.buffer.mb,'256')'.error still. ``` com.esotericsoftware.kryo.KryoException

Re: Spark-sql query got exception.Help

2015-03-25 Thread
$WriterThread.run(PythonRDD.scala:203) 2015-03-26 10:39 GMT+08:00 李铖 lidali...@gmail.com: Yes, it works after I append the two properties in spark-defaults.conf. As I use python programing on spark platform,the python api does not have SparkConf api. Thanks. 2015-03-25 21:07 GMT+08:00 Cheng Lian

Should I do spark-sql query on HDFS or apache hive?

2015-03-17 Thread
Hi,everybody. I am new in spark. Now I want to do interactive sql query using spark sql. spark sql can run under hive or loading files from hdfs. Which is better or faster? Thanks.

Should I do spark-sql query on HDFS or hive?

2015-03-17 Thread
Hi,everybody. I am new in spark. Now I want to do interactive sql query using spark sql. spark sql can run under hive or loading files from hdfs. Which is better or faster? Thanks.

Re: Should I do spark-sql query on HDFS or apache hive?

2015-03-17 Thread
. Even hive tables are read from files HDFS usually. You probably should use HiveContext as its query language is more powerful than SQLContext. Also, parquet is usually the faster data format for Spark SQL. On Tue, Mar 17, 2015 at 3:41 AM, 李铖 lidali...@gmail.com wrote: Hi,everybody. I am